Skip to content

Conversation

@jab
Copy link
Contributor

@jab jab commented Aug 16, 2018

It looks like the original author of #2977 lost interest, and it would still be really useful to have this, so I thought I'd pick it back up.

https://bugs.python.org/issue20849

@nanonyme
Copy link

nanonyme commented Aug 16, 2018

Hey, could you please fix the news check while waiting for a review? I just hit this same thing myself and was about to file a PR maybe ten minutes after you with basically same contents.

I to be honest think there's some merit to the critique on the ticket while this was my first idea as well. If you compare cp to shutil.copytree, cp has a --no-clobber so it creates directories but allows not overwriting existing files. While yours is a very backwards-compatible interpretation, I kinda think cp's way of doing it makes more sense than both the old and new Python idea of copytree

@merwok
Copy link
Member

merwok commented Aug 16, 2018

Hello! Your branch contains unwanted changes to get-pip and a patch file.

If you’re using git on the command line, a nice tip is to use git add -p instead of git add . to choose what changes should be in your next commit.

Graphical git clients usually have a diff hunk selection screen that serves the same purpose.

@jab jab force-pushed the copytree-exist_ok branch from 63b1c9b to 98e3668 Compare August 16, 2018 20:47
@jab
Copy link
Contributor Author

jab commented Aug 16, 2018

@merwok wrote:

Hello! Your branch contains unwanted changes to get-pip and a patch file.

Sorry about that! Fixed in the latest revision.

If you’re using git on the command line, a nice tip is to use git add -p instead of git add . to choose what changes should be in your next commit.

Nice, thanks for the tip!

@nanonyme wrote:

Hey, could you please fix the news check while waiting for a review?

Done in the latest revision.

I just hit this same thing myself and was about to file a PR maybe ten minutes after you with basically same contents.

No way! Solidarity, @nanonyme. Extra funny coincidence given how long this issue has been around.

I to be honest think there's some merit to the critique on the ticket while this was my first idea as well. If you compare cp to shutil.copytree, cp has a --no-clobber so it creates directories but allows not overwriting existing files. While yours is a very backwards-compatible interpretation, I kinda think cp's way of doing it makes more sense than both the old and new Python idea of copytree

I agree. Will wait for a maintainer to weigh in on how to proceed.

@jab jab force-pushed the copytree-exist_ok branch 3 times, most recently from e783b52 to 24f03ee Compare August 17, 2018 03:28
@1st1
Copy link
Member

1st1 commented Sep 7, 2018

@giampaolo @vstinner The PR looks OK to my eyes, should we merge it?

@jab
Copy link
Contributor Author

jab commented Sep 15, 2018

Thanks for reviewing @1st1! Since the others you pinged haven’t been able to respond yet, is there anyone else who should review before this can be merged? The rest of the month will be busy for me so hoping to land while I still have the spare cycles.

@jab
Copy link
Contributor Author

jab commented Dec 15, 2018

Any maintainers up for giving this a look some time soon? Cheers

3 months ago @1st1 wrote:

@giampaolo @vstinner The PR looks OK to my eyes, should we merge it?

@giampaolo
Copy link
Contributor

giampaolo commented Dec 15, 2018

As I said on the bug tracker I think the new argument should be called "dirs_exists_ok" (EDIT: "dirs_exist_ok") instead of "exists_ok" so that it's clear that the behavior does not affect files. Other than that the patch is fine with me but it should be updated as it has conflicts with master.

@jab jab force-pushed the copytree-exist_ok branch from 9a1016b to 2279717 Compare December 20, 2018 00:15
@jab
Copy link
Contributor Author

jab commented Dec 20, 2018

Thanks for replying @giampaolo! This has been labeled as "awaiting review" since Aug 16 and nobody commented on bpo-20849 since I did on Aug 17, so I didn't realize you were awaiting changes on this before it could be merged.

should be updated as it has conflicts with master

Rebased on latest master. (Had to resolve conflicts from your bpo-33695 changes, which I hadn't noticed before – nice changes!) All tests are passing on CI (including the test_copytree_exist_ok this adds).

I think the new argument should be called "dirs_exists_ok" instead of "exists_ok" so that it's clear that the behavior does not affect files

I don't follow – this does affect files. Are you mistakenly thinking that this patch implements option (b) described in Éric Araujo's comment (as he said the original patch did), whereas this patch actually implements option (a) (i.e. it behaves like BSD cp)?

From the docs I wrote for this: "If exist_ok is true, the destination directory may already exist, and existing files in the destination tree will be overwritten by matching files in the source tree." (emphasis added)

Also:

  1. Matching the name of the "exist_ok" argument that os.makedirs() accepts makes this new API easier for users to remember, especially upon realizing that it's passed straight through to the underlying os.makedirs(dst, exist_ok=exist_ok) call.
  2. "dirs_exists_ok" is grammatically incorrect: "dirs" is plural and "exists" is singular.
  3. The alternatives "dir_exists_ok", "dst_exists_ok", and "dirs_exist_ok" are grammatically correct, but in each case, the longer name suggests greater precision in describing the behavior (i.e. "that's all this does"), when in fact it affects existing files too.

So can this be merged as-is before there are any further conflicts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I assumed this can't land in a 3.7.x release. Can it?)

@jab
Copy link
Contributor Author

jab commented Dec 20, 2018

$ mkdir existing_dir
$ mkdir src_dir
$ echo "will be replaced" > existing_dir/file
$ echo "replaced" > src_dir/file
$ # behaves like `cp -R src_dir/ existing_dir` on macOS/BSD (note trailing slash):
$ ./python.exe -c "import shutil; shutil.copytree('src_dir', 'existing_dir', exist_ok=True)"
$ cat existing_dir/file
replaced

(Unfortunately GNU cp apparently treats the trailing slash inconsistently. See e.g. https://jondavidjohn.com/linux-vs-osx-the-cp-command/ )

@giampaolo
Copy link
Contributor

giampaolo commented Dec 20, 2018

shutil.copytree always overwrites existing destination files. My point is that exists_ok is ambiguous in that it does not specify who's the target, whether files or directories. One may erroneously think that exists_ok=False means that src and dst files with the same name will be skipped or something. dirs_exist_ok explicitly tells who's the target and removes that ambiguity.

You're right that "exist_ok" argument matches os.makedirs's, but the main character here is shutil.copytree, not os.makedirs. The usage of os.makedirs is just an internal implementation detail.

@nanonyme
Copy link

Just a bit of an out-of-the-box thought: how about replace_existing/overwrite instead of all these exist_ok and dirs_exist_ok?

@giampaolo
Copy link
Contributor

Same problem (doesn't tell who's the target).

@jab
Copy link
Contributor Author

jab commented Dec 21, 2018

My point is that exists_ok is ambiguous in that it does not specify who's the target, whether files or directories.

I think using the word "target" instead of "destination" is muddying the waters in this discussion. If we stick to using the word "destination" instead of "target", which is the term used in the function description itself, then isn't it clear that we're talking about the dst argument, which is a directory?

One may erroneously think that exists_ok=False means that src and dst files with the same name will be skipped or something.

I'm having trouble making any sense of this unless you meant "exist_ok=True" here. Did you?

If so, then even with "dirs_exist_ok=True", couldn't that still be misinterpreted as "files with the same name might be skipped since this only applies to directories"?

@giampaolo
Copy link
Contributor

giampaolo commented Dec 21, 2018

I think using the word "target" instead of "destination" is muddying the waters in this discussion. If we stick to using the word "destination" instead of "target", which is the term used in the function description itself, then isn't it clear that we're talking about the dst argument, which is a directory?

No, it's not, because "exists_ok" is meant for directories only and what copytree is about is copying files and directories, so it's not clear what "exists_ok" argument operates on unless you read the doc. Also, the argument operates against all sub-directories (plural) with the same name which can be encountered while walking down the tree, not only dst, and that is why the argument name cannot be singular (remember: it's a recursive function). This ambiguity with the argument name has already been pointed out on the tracker and in here multiple times (see here) so I'm really not sure how else to explain it.

If so, then even with "dirs_exist_ok=True", couldn't that still be misinterpreted as "files with the same name might be skipped since this only applies to directories"?

It cannot be misinterpreted because of the "dirs_" bit. It's clear it's about directories and not about files or both.

@jab jab force-pushed the copytree-exist_ok branch from 2279717 to 6cf8e53 Compare December 21, 2018 13:46
@jab jab changed the title bpo-20849: add exist_ok to shutil.copytree bpo-20849: add dirs_exist_ok to shutil.copytree Dec 21, 2018
@jab
Copy link
Contributor Author

jab commented Dec 21, 2018

I have renamed the argument dirs_exist_ok in the latest revision. CI is all green.

I'm still not following your logic but prefer to defer so this can be merged.

This ambiguity with the argument name has already been pointed out on the tracker and in here multiple times

To be fair, that was in the context of someone else's patch from 5 years ago, which implemented behavior (b). This patch implements behavior (a).

It cannot be misinterpreted because of the "dirs_" bit. It's clear it's about directories and not about files or both.

I just meant that now the argument name says nothing about what happens to files. Before you were imagining a user who didn't read the docs and jumping to the wrong conclusion about what happens to files. I don't think the new name really prevents that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply rephrase it as something along these lines:

*dirs_exist_ok* dictates whether to raise an exception in case *dst* as well as 
any missing parent directory already exists. 

I would not talk about the overwriting of files because the argument specifically targets directories and has nothing to do with files. When copying a big directory another process may create files in the dst directory which conflicts with the ones living in src, and they are always overwritten even when dirs_exist_ok=False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of dirs_exist_ok=False (raising an exception) is already documented in the previous paragraph, so your rephrasing of this paragraph seems redundant. The purpose of this paragraph was to document the behavior of dirs_exist_ok=True.

I've just removed the part that talks about files.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@jab
Copy link
Contributor Author

jab commented Dec 21, 2018

@bedevere-bot I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@giampaolo: please review the changes made to this pull request.

@jab jab force-pushed the copytree-exist_ok branch from acbbd3a to 8a0c7d2 Compare December 27, 2018 19:39
@jab jab force-pushed the copytree-exist_ok branch from 8a0c7d2 to 4a9c7ca Compare December 28, 2018 00:15
@jab jab force-pushed the copytree-exist_ok branch from 4a9c7ca to dbb9162 Compare December 28, 2018 14:56
@giampaolo
Copy link
Contributor

Looks good. Merging.

@giampaolo giampaolo merged commit 9e00d9e into python:master Dec 28, 2018
@jab jab deleted the copytree-exist_ok branch December 28, 2018 18:31
@jab
Copy link
Contributor Author

jab commented Dec 28, 2018

Thanks @giampaolo! Can this be considered for backporting to 3.7.x? If so I would be happy to submit another PR for that.

@giampaolo
Copy link
Contributor

;-)
No, it cannot be backported to 3.7 because this is a new feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants