dvc: preserve exec bit for tracked files#5061
Conversation
| self.is_user_executable = os.path.isfile(path) and os.access( | ||
| path, os.X_OK | ||
| ) |
There was a problem hiding this comment.
The file might not exist here or might not be in the same state as the rest of the info. In this place we should load the isexec from the info provided, e.g. how metric is loaded above.
The actual stat should happen in save() method, as pointed out in #5036 (comment) .
There was a problem hiding this comment.
Thank you for the clarification. I moved stat to save().
| ret[self.PARAM_IS_USER_EXECUTABLE] = self.is_user_executable | ||
|
|
There was a problem hiding this comment.
same as in other fields above
| ret[self.PARAM_IS_USER_EXECUTABLE] = self.is_user_executable | |
| if self.is_user_executable: | |
| ret[self.PARAM_IS_USER_EXECUTABLE] = self.is_user_executable | |
There was a problem hiding this comment.
Yes, I was leaving it outside of if to have the test that I had at that stage pass, put it behind if now.
| PARAM_PLOT_HEADER = "header" | ||
| PARAM_PERSIST = "persist" | ||
| PARAM_DESC = "desc" | ||
| PARAM_IS_USER_EXECUTABLE = "is_user_executable" |
There was a problem hiding this comment.
I guess we could even use exec: true/false, as it is specified for a particular output and it doesn't seem like we will have another use for exec here in the future. Could also use isexec. The current is_user_executable seems a bit long. WDYT?
There was a problem hiding this comment.
Also, is is very much unnecessary. exec or executable is fine imo.
There was a problem hiding this comment.
@efiop @skshetry good suggestion about a shorter variable name. It fits the style of this project well. I hesitated a lot, probably too much, between exec and isexec and decided to go with exec. Unfortunately, as a result, I've got a pylint error:
W0622: Redefining built-in 'exec' (redefined-builtin)
so I've started using isexec instead. Please let me know if you have a stronger opinion about using executable instead.
There was a problem hiding this comment.
good suggestion about a shorter variable name.
The suggestion was regarding the keyword in the dvc.yaml/dvcfile, not the variable name. I have no strong opinion towards either exec or isexec, but the decision should not be based on that pylint complained.
There was a problem hiding this comment.
I think that isexec seems better than exec as the latter sounds like "execute this output: true" which is confusing.
efiop
left a comment
There was a problem hiding this comment.
Thanks for the PR! The plan looks pretty good 🙂
Also fixed all tests in test_add.py
| class HashInfo: | ||
| PARAM_SIZE = "size" | ||
| PARAM_NFILES = "nfiles" | ||
| PARAMS_TO_IGNORE = ("isexec",) |
There was a problem hiding this comment.
@efiop I'm not sure if this is a good approach and there could be a better way to ignore isexec from .dvc file in HashInfo. I plan to look at a better way to handle it later but maybe you have some quick suggestions?
There was a problem hiding this comment.
@dudarev Take a look at loadd_from in dvc/output/__init__.py. You just need to pop the value for isexec from info, same as we do with things like desc.
There was a problem hiding this comment.
Thanks! I've modified the code to pop isexec in loadd_from.
| self.isexec = os.path.isfile(self.def_path) and os.access( | ||
| self.def_path, os.X_OK | ||
| ) |
There was a problem hiding this comment.
| self.isexec = os.path.isfile(self.def_path) and os.access( | |
| self.def_path, os.X_OK | |
| ) | |
| self.isexec = self.tree.isfile(self.path_info) and self.tree.isexec(self.path_info) |
There was a problem hiding this comment.
Good point! I've done this and also updated utils:is_exec function to return boolean by casting to bool:
| metric="metrics" in key, | ||
| plot="plots" in key, | ||
| checkpoint="checkpoints" in key, | ||
| isexec="isexec" in key, |
There was a problem hiding this comment.
isexec can't be set through CLI
| isexec="isexec" in key, |
|
|
||
|
|
||
| def test_add_executable(tmp_dir, dvc): | ||
| tmp_dir.dvc_gen_exec({"foo": "foo"}) |
There was a problem hiding this comment.
This is the only place where dvc_gen_exec is used, better to just inline it here
There was a problem hiding this comment.
I inlined dvc_gen_exec code here.
| def setUp(self): | ||
| super().setUp() | ||
|
|
||
| self.create_executable(self.EXECUTABLE, self.EXECUTABLE_CONTENT) |
There was a problem hiding this comment.
The only place where create_executable is used, better to inline it here too :)
Btw, we are migrating from unittest to pytest tests, so it would be great to re-write this one with pytest too.
There was a problem hiding this comment.
Inlined created_executable here too.
| if isexec: | ||
| st = os.stat(path_info.fspath) | ||
| os.chmod(path_info.fspath, st.st_mode | stat.S_IEXEC) | ||
|
|
There was a problem hiding this comment.
IIRC chmod won't modify the mtime(we use it to detect changes in the workspace), so there is nothing cache should worry about when doing chmod +x. The chmod part is more related to the output itself, than to the cache (notice how we are just passing through the isexec, but don't really use anything cache-specific). That's why the saving of isexec is not in cache.save but is actually in output.save. So let's move this to OutputBase.checkout, right after cache.checkout call. Also note that it is better to use tree.* methods instead of os.*, as that helps to keep the abstraction for different types of outputs 🙂
There was a problem hiding this comment.
Good point! I moved this code to BaseOutput.checkout and used tree.* methods. There is one minor stylistic issue: all methods in BaseTree are using path_info as an argument, I've added BaseTree.isexec(path) since it was not there and all children classes use path argument in that method:
There was a problem hiding this comment.
I modified isexec to use path_info as an argument
There was a problem hiding this comment.
Also need to add PARAM_ISEXEC to https://github.com/iterative/dvc/blob/76c6c5427816b25a81d90f86d3af1cb19779802f/dvc/stage/utils.py#L178 , so it is not taking it into account when computing stage hash. This combined with os.access(X_OK) returning True for non-exec files, is resulting in windows failures, so we can't really skip those tests.
|
|
| """ | ||
| return True | ||
|
|
||
| def set_exec(self, path_info): |
There was a problem hiding this comment.
Note for us, for the future: time to finally move BaseTree to ABC...
|
One more thing would be to use this in |
|
Will there be a WARN msg or something on Windows? Does it even matter that much? (I think you can usually try to execute any file on Win, it's more about the file extension). |
|
@jorgeorpinel No warning. It is a minor feature that is safe to ignore on Windows. Same thing with git, so shouldn't be a big suprise for anyone, but we'll see if someone complains. |
|
K thanks. I was just double checking (for docs) as I read about that warning somewhere (in the OP I think). |
* docs: add isexec field description Per treeverse/dvc#5061 * Update content/docs/user-guide/dvc-files-and-directories.md Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com> * Update content/docs/user-guide/dvc-files-and-directories.md * Update content/docs/user-guide/dvc-files-and-directories.md Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
For ticket #4578
Based on the discussion in #5036 I've added
isexecparameter.The plan for this PR:
test_add_executablewith an executable filedisable this feature on Windows and create a separate ticket to enable it there❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
treeverse/dvc.org#2039
Thank you for the contribution - we'll try to review it as soon as possible. 🙏