Skip to content

Add ZipDirectory and Unzip tasks#3291

Merged
jeffkl merged 16 commits intodotnet:masterfrom
jeffkl:zip
May 22, 2018
Merged

Add ZipDirectory and Unzip tasks#3291
jeffkl merged 16 commits intodotnet:masterfrom
jeffkl:zip

Conversation

@jeffkl
Copy link
Copy Markdown
Contributor

@jeffkl jeffkl commented May 9, 2018

ZipDirectory task zips up a whole directory. It will fail if the destination file already exists. Zipping individual files is a lot harder because you have to calculated a base path for all files and make zip entries relative to that. The code in the CLR that does this is very efficient by reusing string buffers and I didn't want to have to replicate the code. So for now there's only ZipDirectory instead of a more generic Zip.

Unzip unzips files from an archive to a directory.

  • Defaults to skip unzipping files that are already up-to-date
  • Logs every file that was unzipped
  • Supports cancellation
  • Supports overwriting read-only files
  • Supports unzipping multiple files to the same directory

Fixes #1781

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want this? Seems like it could be a giant list in some cases and use up a lot of memory and may or may not be needed? The output folder might be enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its what <Copy /> does so I though I'd ... 😑 ... 😎 copy it.

Technically you could copy a ton of files as well

<ItemGroup>
  <Files Include="Folder\**" />
</ItemGroup>

<Copy
  SourceFiles="@(Files)"
  DestinationFolder="Foo"
  />

And we'll waste all of the memory. What should I do?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffkl Would it save memory to make _unzippedFiles and _destinationFiles a collection of tuples of sourceTaskItem and destinationPath.FullName? That way if you access the output properties DestinationFiles or UnzippedFiles you could build the full list of TaskItems on demand?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to just remove it. If necessary, we can add the functionality later.

@jeffkl jeffkl changed the title WIP: Add ZipDirectory and Unzip tasks Add ZipDirectory and Unzip tasks May 11, 2018
@jeffkl jeffkl force-pushed the zip branch 2 times, most recently from 8a91c23 to 401c24d Compare May 14, 2018 14:33
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about this interface. It's pretty much impossible to get correct incremental behavior out of this. You'd have to:

  • Enumerate the contents of the folder just before the target that invokes ZipDirectory.
  • Additionally compute a hash of those items to avoid the removed-an-input problem.
  • Use the new stuff in the target that calls this.

Accepting Item[] would be more complex in implementation but easier to make correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with @AndyGerlicher and we're concerned more about the complexity required to make it incremental out-of-the-box. So we're going to leave it as-is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we wrote a target that used this we would need to make it incremental. But given that it zips a directory, it doesn't seem worth the complexity to implement at the task level. It might be worth writing a sample in the docs page that is a target that zips your output folder that would be incremental as an example.

@jeffkl jeffkl force-pushed the zip branch 3 times, most recently from 2abd49c to d7e7f23 Compare May 14, 2018 16:07
Comment thread src/Tasks/ZipDirectory.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not deserve an Overwrite param?

Comment thread src/Tasks/ZipDirectory.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just create? Didn't we make something create instead of erroring in this situation fairly recently?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the directory to zip up, so it must exist

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that . . . is definitely true 😬

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is sufficiently long-running/IO intensive that we should Yield() here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the zip file could contain a large file, I'll add it

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was at one point, good catch

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth using the CopyToAsync overload that takes a CancellationToken to speed up canceling if we're stuck in a huge file? Possibly not . . .

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a CancellationTokenSource instead and pass around a CancellationToken?

@jeffkl
Copy link
Copy Markdown
Contributor Author

jeffkl commented May 15, 2018

@Microsoft/msbuild-maintainers this is ready for final review

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to manually port these to the internal VS repo after this merges.

Comment thread src/Tasks.UnitTests/Unzip_Tests.cs Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use

[PlatformSpecific(TestPlatforms.Windows)]

instead.

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not eagerly exit to reduce the time it takes to finish the build on error?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, to show all errors at once rather than one at a time. The alternative is to complicate things and check everything first, and do the extraction second.


In reply to: 188788554 [](ancestors = 188788554)

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't Reacquire be in a finally block? Or does the engine handle this?

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there's files in multiple archives with the same path? Should there be an option to warn or error about this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its okay to just ignore it, do you want me to log something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess people can use the diagnostic log to see the task inputs, so I guess they already have insight into the sources.

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be made readonly after the write to preserve user data? :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I'd assume that if the file is readonly in the archive it should be readonly after extraction but I don't see attributes like that in the ZipArchiveEntry class... It seems a little weird to be setting the readonly attribute based on what the file used to have. Thoughts?

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Contributor

@cdmihai cdmihai May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: named constant for the buffer size.

Comment thread src/Tasks/ZipDirectory.cs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other task wraps each IO in a try catch. Should this one be wrapped too?

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the returned task be awaited before the entire task returns?

Comment thread src/Tasks/Unzip.cs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what happens if another file exists in the destination and is longer than the archive entry. Does this override or does it just copy over the source stream, leaving some trailing bytes from the destination? The CopyTo documentation is a bit unclear on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, did not notice the FileMode.Create. Nevermind.


In reply to: 188796551 [](ancestors = 188796551)

@rainersigwald
Copy link
Copy Markdown
Member

rainersigwald commented May 21, 2018 via email

@jeffkl
Copy link
Copy Markdown
Contributor Author

jeffkl commented May 22, 2018

@cdmihai can you please review my changes to address your comments? I think this is ready to merge but wanted to make sure you were happy.

Comment thread src/Tasks/ZipDirectory.cs
[Required]
public ITaskItem SourceDirectory { get; set; }

public override bool Execute()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please make this yieldable as well?

@jeffkl jeffkl merged commit f90c73c into dotnet:master May 22, 2018
@jeffkl jeffkl deleted the zip branch May 22, 2018 20:36
Comment thread src/Tasks/Unzip.cs
{
foreach (ZipArchiveEntry zipArchiveEntry in sourceArchive.Entries.TakeWhile(i => !_cancellationToken.IsCancellationRequested))
{
FileInfo destinationPath = new FileInfo(Path.Combine(destinationDirectory.FullName, zipArchiveEntry.FullName));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to comment after you've just merged, but would you consider normalizing directory separators? One of the most common errors I've encountered is that zips created on Windows using \ as a directory separator will not extract correctly on Linux/macOS. Instead \ becomes part of the filename.
image

This common error is something we worked around in the unzip task used in the .NET Core build system: see https://github.com/dotnet/arcade/blob/0d117c0b3649565a6d9aacf9f435e29ab67c3c0d/src/Microsoft.DotNet.Build.Tasks.IO/src/UnzipArchive.cs#L64-L76

@jeffkl
Copy link
Copy Markdown
Contributor Author

jeffkl commented May 22, 2018

@natemcmaster I'll double check but I think my unit tests cover that scenario.

Comment thread src/Tasks/ZipDirectory.cs
try
{
Log.LogMessageFromResources(MessageImportance.High, "ZipDirectory.Comment", sourceDirectory.FullName, destinationFile.FullName);
ZipFile.CreateFromDirectory(sourceDirectory.FullName, destinationFile.FullName);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need make the "includeBaseDirectory" argment as an option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants