Skip to content

Comments

new function: std.file.copyRecurse#3948

Closed
aG0aep6G wants to merge 4 commits intodlang:masterfrom
aG0aep6G:copyRecurse
Closed

new function: std.file.copyRecurse#3948
aG0aep6G wants to merge 4 commits intodlang:masterfrom
aG0aep6G:copyRecurse

Conversation

@aG0aep6G
Copy link
Contributor

For when you need to copy a directory.

Possible issues:

  • Timestamps and attributes are not copied for symlinks. See NOTEs in the code.
  • Copying stops on the first error. No way to ignore errors and just copy as much as possible.
  • Creation times are not preserved (Windows).
  • Asserting that there are no symlinks on Windows, because symlink is Posix-only. But attrIsSymlink can apparently be true on Windows.

std/file.d Outdated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the function useful but I don't like the interface.

why not copyDirectory(string from, string to, bool recursive = true, PreserveAttributes preserve = preserveAttributesDefault) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would recursive be optional? Just use copy if that's what you want. Copying a directory is recursive by its very nature; otherwise, you'd just be doing mkdir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface shouldn't require immutable characters in from and to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not copyDirectory(string from, string to, bool recursive = true, PreserveAttributes preserve = preserveAttributesDefault) ?

I suppose with recursive = false this would copy files in the directory and create sub-directories, but it wouldn't step into the sub-directories and copy their contents, right?

Is there a need for that? As far as I'm aware, file managers don't supply that functionality. I don't see where one would need it.

@jmdavis
Copy link
Member

jmdavis commented Jan 25, 2016

Asserting that there are no symlinks on Windows, because symlink is Posix-only. But attrIsSymlink can apparently be true on Windows.

Windows does have symlinks now (starting with Vista, I believe - but we specifically support Windows 7 and newer). I'm not very familiar with how they work exactly, but a quick search shows a number of articles on them. std.file.symlink and std.file.readLink should probably be updated to support Windows (and that may be necessary for copyRecurse to work properly with symlinks on Windows), but in principle, copyRecurse should be able to work with symlinks on Windows.

@jmdavis
Copy link
Member

jmdavis commented Jan 25, 2016

Copying stops on the first error. No way to ignore errors and just copy as much as possible.

I'm not sure that it's critical for a first round of this (and we do a poor job with handling errors like this in general in std.file - e.g. with dirEntries), but we really should have a way to handle errors with a function like copyRecurse, or it becomes useless outside of small cases. I don't know what the best way to deal with it would be, but the first thought that comes to mind would be a delegate which was called when an error occurred so that it could decide what to do. A default implementation which does not have any error-handling would still be desirable though and is effectively what you have now.

std/file.d Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to here looks like a noop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this uses mkdirRecurse instead of mkdir every time? Shouldn't it do the extra logic just for the root target directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to here looks like a noop

Oh yeah, that's a left-over from when I tried to support ranges. Fixed.

Is there any reason this uses mkdirRecurse instead of mkdir every time? Shouldn't it do the extra logic just for the root target directory?

mkdir throws when the directory exists; mkdirRecurse doesn't. Could replace with if (!exists(to)) mkdir(to);, I guess.

@JakobOvrum
Copy link
Contributor

This function allocates a lot of short-lived GC memory, one chunk for each unique path in the tree. This can and should be avoided.

A related issue is that it should ideally work with all range paths, not just array paths. This enhancement has been done for copy and much of Phobos. This helps with pushing allocation decisions downward.


I suspect this approach gives suboptimal performance, especially when from and to are on different physical devices, but this could be measured. Instead of trying to reimplement rsync in the standard library, we could simply warn about this in the documentation and suggest that users invoke rsync (or whatever) instead when performance is desired.

std/file.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std.path needs Delphi's ExcludeTrailingPathDelimiter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. In the meantime, I should probably replace buildNormalizedPath with something that only checks for the trailing separator.

@aG0aep6G aG0aep6G force-pushed the copyRecurse branch 2 times, most recently from 85985fa to a75a37b Compare January 25, 2016 18:19
@aG0aep6G
Copy link
Contributor Author

copyRecurse should be able to work with symlinks on Windows.

Agreed.

we really should have a way to handle errors with a function like copyRecurse, or it becomes useless outside of small cases. I don't know what the best way to deal with it would be, but the first thought that comes to mind would be a delegate which was called when an error occurred so that it could decide what to do.

I'd been playing around with a version that collects all thrown exceptions and throws a single one at the end of the operation. I.e., it would copy as much as possible and then inform the caller via exception about what failed. The delegate thing would be more flexible, so that sounds good to me.

This function allocates a lot of short-lived GC memory, one chunk for each unique path in the tree. This can and should be avoided.

A related issue is that it should ideally work with all range paths, not just array paths. This enhancement has been done for copy and much of Phobos. This helps with pushing allocation decisions downward.

I had tried that, but got frustrated when some of std.file's functions would take ranges but others only const arrays (mkdir vs mkdirRecurse). dirEntries even demands a string. I'll try and reduce the allocations. Any non-obvious pointers?

I suspect this approach gives suboptimal performance, especially when from and to are on different physical devices, but this could be measured. Instead of trying to reimplement rsync in the standard library, we could simply warn about this in the documentation and suggest that users invoke rsync (or whatever) instead when performance is desired.

You mean we could try and skip files that haven't changed? I think we shouldn't try to be that clever here. Keep it simple, just do what the name says: copy everything.

@aG0aep6G
Copy link
Contributor Author

I'm a little confused about the changelog. I had to rebase and new changelog entries showed up. They seem to be the ones at http://dlang.org/changelog/2.070.0.html. Shouldn't the changelog for 2.070 be in the stable branch now?

@JakobOvrum
Copy link
Contributor

I had tried that, but got frustrated when some of std.file's functions would take ranges but others only const arrays (mkdir vs mkdirRecurse). dirEntries even demands a string. I'll try and reduce the allocations. Any non-obvious pointers?

Those functions have yet to be improved. dirEntries is the only tough one, but one we have to solve to get @nogc going.

You mean we could try and skip files that haven't changed?

No, a good tool reads and writes asynchronously.

std/file.d Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just collapse this if-statement into the enforce condition

@aG0aep6G aG0aep6G force-pushed the copyRecurse branch 3 times, most recently from b48a03b to 67cb279 Compare February 1, 2016 00:08
@aG0aep6G
Copy link
Contributor Author

aG0aep6G commented Feb 1, 2016

Somewhat larger update. New commits start at c178944 ("remove @safe from template setTimes").

Summary:

  • Rangified readLink, symlink, and (more or less) copyRecurse.
  • Removed @safe from setTimes, readLink, and symlink. Attribute reference can take care of that.
  • Fixed wrong documentation of symlink.

There are still two instances of stringification in copyRecurse (aside from when throwing exceptions): mkdirRecurse and dirEntries. I'm afraid I might not have the will power to rangify those.

Overall, I'm not very confident in my rangification of copyRecurse. If it's rubbish, please advise.

Other open issues:

  • Symlinks on Windows - Since we don't support anything older than Windows 7, and Windows 7 supports symlinks, we should support symlinks on Windows. I couldn't find any declaration of CreateSymbolicLink, though. What do I do here?
  • Attributes and timestamps of symlinks on Posix. Similar situation to the Windows thing, but it's less clear if we can support it or if the needed Posix functions are too new. Those would be fchmodat and lutimes, as far as I could google.

By the way, I've noticed that std.internal.cstring.tempCString is unsafe and @trusted. Not cool.

@PetarKirov
Copy link
Member

@aG0aep6G CreateSymbolicLink should be included in winbase.d, but it looks like druntime has an old version of the Windows SDK headers and we will need to update it to include APIs supported by newer version of Windows.

@CyberShadow
Copy link
Member

On Windows, we should create junctions (not symlinks) for directories, as that doesn't require elevated privileges by default.

Here's my code to create junctions and symlinks, feel free to submit to std.file:

https://github.com/CyberShadow/ae/blob/b8fc2b12a99f4554b8822938d8c548abfff33f5e/sys/file.d#L556-L658

@aG0aep6G
Copy link
Contributor Author

aG0aep6G commented Feb 3, 2016

I've split off the prerequisite parts of this, and made individual pull requests:

@DmitryOlshansky
Copy link
Member

@aG0aep6G needs rebase

@aG0aep6G
Copy link
Contributor Author

@aG0aep6G needs rebase

This needs more than a rebase. For a start, it needs #3967.

@wilzbach
Copy link
Contributor

This needs more than a rebase. For a start, it needs #3967.

aG0aep6G Your wish has been heard - wanna rebase? ;-)

copyRecurse differs from copy in that it accepts directories and copies
them recursively.
@aG0aep6G
Copy link
Contributor Author

aG0aep6G Your wish has been heard - wanna rebase? ;-)

Rebased, but I don't think this is ready to be merged.

@DmitryOlshansky
Copy link
Member

Rebased, but I don't think this is ready to be merged.

Needs a new rebase, what else do you have on your mind?

@aG0aep6G
Copy link
Contributor Author

aG0aep6G commented May 5, 2016

what else do you have on your mind?

  1. Symlinks on Windows

    I just tried to add CreateSymbolicLink to winbase.d here, but either I messed up the declaration, or it's not in dmd's kernel32.lib. Giving up for today. Help would be appreciated.

    CyberShadow has provided some code, but it looks rather daunting. Symlinks, junction points, reparse points ... I'd need time and motivation to educate myself on this Windows stuff.

    Of course we could also decide that we don't need to support symlinks on Windows for now.

  2. Attributes of symlinks on Posix

    I don't know if we can rely on the needed functions fchmodat and lutimes. Again, not supporting this is an option as well.

  3. Maybe try and cut down more on allocations.

    text is called in the calls to mkdirRecurse (only root directory) and dirEntries (every directory). Could maybe rangify those.

    Multiple functions are called on src.save/dst.save of which each calls tempCString. Could maybe call tempCString once and pass the result.

@andralex
Copy link
Member

andralex commented Oct 9, 2016

I'm okay with the addition assuming some non-tenuous semantics can be defined. There are some implementation issues that I'd like to see improved.

@wilzbach wilzbach added this to the 2.075.0 milestone Mar 5, 2017
@wilzbach
Copy link
Contributor

wilzbach commented Mar 5, 2017

Rebased, but I don't think this is ready to be merged.

@aG0aep6G so I think this PR would be a lot more important than the CHEATSHEET one.
Do you have some time to finish it up?

@aG0aep6G
Copy link
Contributor Author

aG0aep6G commented Mar 5, 2017

@aG0aep6G so I think this PR would be a lot more important than the CHEATSHEET one.
Do you have some time to finish it up?

The issues above still stand. If someone can give me directions on those, sure, I can work on this. Otherwise I'm willing to admit defeat here.

@MetaLang
Copy link
Member

MetaLang commented Aug 6, 2017

@aG0aep6G are you still working on this or can I close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.