Skip to content

dvc: preserve exec bit for tracked files#5061

Merged
efiop merged 27 commits into
treeverse:masterfrom
dudarev:4578-save-is-user-executable
Dec 20, 2020
Merged

dvc: preserve exec bit for tracked files#5061
efiop merged 27 commits into
treeverse:masterfrom
dudarev:4578-save-is-user-executable

Conversation

@dudarev
Copy link
Copy Markdown
Contributor

@dudarev dudarev commented Dec 8, 2020

For ticket #4578

Based on the discussion in #5036 I've added isexec parameter.

The plan for this PR:

  • add test_add_executable with an executable file
  • make all tests pass
  • figure out how to test mode change on checkout
  • implement chmod on checkout
  • fix all tests if they will be broken on the previous step
  • disable this feature on Windows and create a separate ticket to enable it there

treeverse/dvc.org#2039

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@dudarev dudarev changed the title #4578 added is_user_executable to dvc file [WIP] #4578 added is_user_executable to dvc file Dec 8, 2020
Comment thread dvc/output/base.py Outdated
Comment on lines +139 to +141
self.is_user_executable = os.path.isfile(path) and os.access(
path, os.X_OK
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file might not exist here or might not be in the same state as the rest of the info. In this place we should load the isexec from the info provided, e.g. how metric is loaded above.

The actual stat should happen in save() method, as pointed out in #5036 (comment) .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification. I moved stat to save().

Comment thread dvc/output/base.py Outdated
Comment on lines +335 to +336
ret[self.PARAM_IS_USER_EXECUTABLE] = self.is_user_executable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as in other fields above

Suggested change
ret[self.PARAM_IS_USER_EXECUTABLE] = self.is_user_executable
if self.is_user_executable:
ret[self.PARAM_IS_USER_EXECUTABLE] = self.is_user_executable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was leaving it outside of if to have the test that I had at that stage pass, put it behind if now.

Comment thread dvc/output/base.py Outdated
PARAM_PLOT_HEADER = "header"
PARAM_PERSIST = "persist"
PARAM_DESC = "desc"
PARAM_IS_USER_EXECUTABLE = "is_user_executable"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could even use exec: true/false, as it is specified for a particular output and it doesn't seem like we will have another use for exec here in the future. Could also use isexec. The current is_user_executable seems a bit long. WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is is very much unnecessary. exec or executable is fine imo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop @skshetry good suggestion about a shorter variable name. It fits the style of this project well. I hesitated a lot, probably too much, between exec and isexec and decided to go with exec. Unfortunately, as a result, I've got a pylint error:

W0622: Redefining built-in 'exec' (redefined-builtin)

so I've started using isexec instead. Please let me know if you have a stronger opinion about using executable instead.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion about a shorter variable name.

The suggestion was regarding the keyword in the dvc.yaml/dvcfile, not the variable name. I have no strong opinion towards either exec or isexec, but the decision should not be based on that pylint complained.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that isexec seems better than exec as the latter sounds like "execute this output: true" which is confusing.

Copy link
Copy Markdown
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The plan looks pretty good 🙂

Comment thread dvc/hash_info.py Outdated
class HashInfo:
PARAM_SIZE = "size"
PARAM_NFILES = "nfiles"
PARAMS_TO_IGNORE = ("isexec",)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop I'm not sure if this is a good approach and there could be a better way to ignore isexec from .dvc file in HashInfo. I plan to look at a better way to handle it later but maybe you have some quick suggestions?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dudarev Take a look at loadd_from in dvc/output/__init__.py. You just need to pop the value for isexec from info, same as we do with things like desc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've modified the code to pop isexec in loadd_from.

Comment thread dvc/output/base.py Outdated
Comment on lines +294 to +296
self.isexec = os.path.isfile(self.def_path) and os.access(
self.def_path, os.X_OK
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.isexec = os.path.isfile(self.def_path) and os.access(
self.def_path, os.X_OK
)
self.isexec = self.tree.isfile(self.path_info) and self.tree.isexec(self.path_info)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've done this and also updated utils:is_exec function to return boolean by casting to bool:

https://github.com/iterative/dvc/blob/167ab79b8029d3272e6aa056c12305b6c9befeee/dvc/utils/__init__.py#L463

Comment thread dvc/stage/utils.py Outdated
metric="metrics" in key,
plot="plots" in key,
checkpoint="checkpoints" in key,
isexec="isexec" in key,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isexec can't be set through CLI

Suggested change
isexec="isexec" in key,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it

Comment thread tests/func/test_add.py Outdated


def test_add_executable(tmp_dir, dvc):
tmp_dir.dvc_gen_exec({"foo": "foo"})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where dvc_gen_exec is used, better to just inline it here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inlined dvc_gen_exec code here.

Comment thread tests/func/test_checkout.py Outdated
def setUp(self):
super().setUp()

self.create_executable(self.EXECUTABLE, self.EXECUTABLE_CONTENT)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place where create_executable is used, better to inline it here too :)

Btw, we are migrating from unittest to pytest tests, so it would be great to re-write this one with pytest too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined created_executable here too.

Comment thread dvc/cache/base.py Outdated
Comment on lines +447 to +450
if isexec:
st = os.stat(path_info.fspath)
os.chmod(path_info.fspath, st.st_mode | stat.S_IEXEC)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC chmod won't modify the mtime(we use it to detect changes in the workspace), so there is nothing cache should worry about when doing chmod +x. The chmod part is more related to the output itself, than to the cache (notice how we are just passing through the isexec, but don't really use anything cache-specific). That's why the saving of isexec is not in cache.save but is actually in output.save. So let's move this to OutputBase.checkout, right after cache.checkout call. Also note that it is better to use tree.* methods instead of os.*, as that helps to keep the abstraction for different types of outputs 🙂

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I moved this code to BaseOutput.checkout and used tree.* methods. There is one minor stylistic issue: all methods in BaseTree are using path_info as an argument, I've added BaseTree.isexec(path) since it was not there and all children classes use path argument in that method:

https://github.com/iterative/dvc/blob/2ade7f7ba56c1b76924f9b5b7f379225b89e1914/dvc/tree/base.py#L198

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified isexec to use path_info as an argument

Copy link
Copy Markdown
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to add PARAM_ISEXEC to https://github.com/iterative/dvc/blob/76c6c5427816b25a81d90f86d3af1cb19779802f/dvc/stage/utils.py#L178 , so it is not taking it into account when computing stage hash. This combined with os.access(X_OK) returning True for non-exec files, is resulting in windows failures, so we can't really skip those tests.

@dudarev
Copy link
Copy Markdown
Contributor Author

dudarev commented Dec 19, 2020

test_add_executable fails on Windows, because isexec is not set on Windows. I need to investigate:

https://github.com/iterative/dvc/blob/2ade7f7ba56c1b76924f9b5b7f379225b89e1914/tests/func/test_add.py#L70

E         {'outs': [{'md5': 'acbd18db4cc2f85cedef654fccc4a4d8', 'path': 'foo', 'size': 3}]} != {'outs': [{'isexec': True, 'md5': 'acbd18db4cc2f85cedef654fccc4a4d8', 'path': 'foo', 'size': 3}]}

Comment thread tests/func/test_add.py Outdated
Comment thread tests/func/test_add.py Outdated
Comment thread tests/func/test_add.py Outdated
Comment thread tests/func/test_add.py
Comment thread dvc/cache/base.py Outdated
Comment thread dvc/output/base.py Outdated
Comment thread dvc/output/base.py Outdated
Comment thread dvc/tree/base.py
"""
return True

def set_exec(self, path_info):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for us, for the future: time to finally move BaseTree to ABC...

Comment thread dvc/tree/local.py Outdated
Comment thread tests/func/test_add.py
@efiop efiop changed the title [WIP] dvc: preserve exec bit for tracked files dvc: preserve exec bit for tracked files Dec 19, 2020
@efiop efiop changed the title dvc: preserve exec bit for tracked files [WIP] dvc: preserve exec bit for tracked files Dec 19, 2020
efiop added a commit to treeverse/dvc.org that referenced this pull request Dec 20, 2020
Copy link
Copy Markdown
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 🙏

efiop added a commit to treeverse/dvc.org that referenced this pull request Dec 20, 2020
efiop added a commit to treeverse/dvc.org that referenced this pull request Dec 20, 2020
@efiop
Copy link
Copy Markdown
Contributor

efiop commented Dec 20, 2020

One more thing would be to use this in dvc list, but that could be added later.

@efiop efiop changed the title [WIP] dvc: preserve exec bit for tracked files dvc: preserve exec bit for tracked files Dec 20, 2020
@efiop efiop merged commit dcea83e into treeverse:master Dec 20, 2020
@jorgeorpinel
Copy link
Copy Markdown
Contributor

Will there be a WARN msg or something on Windows? Does it even matter that much? (I think you can usually try to execute any file on Win, it's more about the file extension).

@efiop
Copy link
Copy Markdown
Contributor

efiop commented Dec 20, 2020

@jorgeorpinel No warning. It is a minor feature that is safe to ignore on Windows. Same thing with git, so shouldn't be a big suprise for anyone, but we'll see if someone complains.

@jorgeorpinel
Copy link
Copy Markdown
Contributor

K thanks. I was just double checking (for docs) as I read about that warning somewhere (in the OP I think).

shcheklein pushed a commit to treeverse/dvc.org that referenced this pull request Dec 21, 2020
* docs: add isexec field description

Per treeverse/dvc#5061

* Update content/docs/user-guide/dvc-files-and-directories.md

Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>

* Update content/docs/user-guide/dvc-files-and-directories.md

* Update content/docs/user-guide/dvc-files-and-directories.md

Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants