Skip to content

dvc: support group cache mode #2298

Merged
efiop merged 8 commits into
treeverse:masterfrom
efiop:marcelo
Aug 9, 2019
Merged

dvc: support group cache mode #2298
efiop merged 8 commits into
treeverse:masterfrom
efiop:marcelo

Conversation

@efiop
Copy link
Copy Markdown
Contributor

@efiop efiop commented Jul 19, 2019

  • Have you followed the guidelines in our
    Contributing document?

  • Does your PR affect documented changes or does it add new functionality
    that should be documented? If yes, have you created a PR for
    dvc.org documenting it or at
    least opened an issue for it? If so, please add a link to it.

treeverse/dvc.org#497

Fixes #2287


Comment thread dvc/remote/local/__init__.py Outdated
Comment thread dvc/remote/base.py Outdated
Comment thread dvc/remote/base.py Outdated
@efiop efiop force-pushed the marcelo branch 3 times, most recently from b03622f to 45dcddb Compare July 21, 2019 01:16
@efiop efiop changed the title [WIP] dvc: support group cache mode dvc: support group cache mode Jul 21, 2019
Comment thread dvc/remote/azure.py Outdated
Comment thread dvc/remote/azure.py Outdated
Comment thread dvc/remote/base.py Outdated
@efiop efiop changed the title dvc: support group cache mode [WIP] dvc: support group cache mode Jul 21, 2019
@efiop efiop force-pushed the marcelo branch 5 times, most recently from 97c9659 to e3b7bb3 Compare August 6, 2019 03:46
@efiop efiop requested a review from Suor August 6, 2019 04:07
@efiop
Copy link
Copy Markdown
Contributor Author

efiop commented Aug 6, 2019

@Suor Adjusted the patch. Please take a look 🙂

@efiop efiop changed the title [WIP] dvc: support group cache mode dvc: support group cache mode Aug 6, 2019
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop , is this intended to support remote shared cache (like SSH)?

Also, the configuration is not that clear to me, you specify shared = group under the [cache] section, right? and this only chmod to 775, but the user will still need to use the setgid of the group instead of the primary one of the user.

Comment thread dvc/remote/base.py Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implies that cache is local, right? are there any downsides of doing this? like a remote cache with ssh for example

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, if remote supports dirs then it should support makedirs. So currently ssh and local support makedirs.

Comment thread dvc/remote/local/__init__.py Outdated
Comment thread dvc/remote/local/__init__.py Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 a little bit confusing to use a method with the same name under the method, what do you think about using utils.makedirs? or import it differently?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is makedirs and RemoteLOCAL.makedirs(or self.makedirs), so it doesn't create collisions. If we were to change this one to utils.makedirs, then we would have to change move,remove and other helpers too. Feels unnecessary.

Comment thread dvc/utils/__init__.py Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe renaming this to atomic_move? and why not moving this to fs aren't this operations related to the file system? (well, they deal with files, maybe my definition of what should be under fs is way too relaxed 😛 )

Copy link
Copy Markdown
Contributor Author

@efiop efiop Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just replaces the old move that we already had there.

As to moving to fs, we have a separate ticket for it #2137 .

Comment thread dvc/remote/local/__init__.py Outdated
@efiop
Copy link
Copy Markdown
Contributor Author

efiop commented Aug 7, 2019

@MrOutis

@efiop , is this intended to support remote shared cache (like SSH)?

No, it is not. For now.

Also, the configuration is not that clear to me, you specify shared = group under the [cache] section, right? and this only chmod to 775, but the user will still need to use the setgid of the group instead of the primary one of the user.

Do you mean that the user will need to be a part of the group that he wants to share the cache with? Yes, he has to. He can be a part of multiple groups though, so he doesn't have to use setgid manually.

@ghost
Copy link
Copy Markdown

ghost commented Aug 7, 2019

Do you mean that the user will need to be a part of the group that he wants to share the cache with? Yes, he has to. He can be a part of multiple groups though, so he doesn't have to use setgid manually.

@efiop, for example, I'm part of two groups mroutis (my primary group) and dvc, if I create a file under the shared cache directory (by default) it would set the group to my primary one (mroutis) instead of dvc.
Even if permissions allow the group to write/read files, we need to make sure the files are being created within the same group, this could be achieved by expecting the user to setgid before hand or letting DVC chown each file to a given group.

@efiop
Copy link
Copy Markdown
Contributor Author

efiop commented Aug 7, 2019

@MrOutis I don't think that is something dvc should deal with(e.g. git's sharedRepository doesn't deal with it too). If people are working on the same machine and want to share their files through the group, they will need to set that same group as a primary one anyway.

@ghost
Copy link
Copy Markdown

ghost commented Aug 7, 2019

@efiop , got it, let's not forget to add a note about it on the docs

@efiop efiop requested review from a user and pared August 8, 2019 19:52
Copy link
Copy Markdown
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some technical comments below.

Comment thread dvc/remote/local/__init__.py Outdated
Comment thread dvc/remote/local/__init__.py Outdated
Comment thread dvc/utils/__init__.py Outdated
Comment thread dvc/utils/__init__.py Outdated
Comment thread tests/func/test_cache.py Outdated
@efiop
Copy link
Copy Markdown
Contributor Author

efiop commented Aug 9, 2019

@Suor Addressed. Please take a look.

@efiop
Copy link
Copy Markdown
Contributor Author

efiop commented Aug 9, 2019

Oh, DS has some valid notes too. Looking into it...

Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Comment thread dvc/utils/__init__.py Outdated
Ruslan Kuprieiev added 7 commits August 9, 2019 17:10
In this mode, dvc will set appropriate file/dir permissions to support
sharing cache between different users within the same group.

Fixes treeverse#2287

Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
@efiop efiop merged commit deb3372 into treeverse:master Aug 9, 2019
@DXist
Copy link
Copy Markdown

DXist commented Aug 10, 2019

There is a problem with unprotecting a symlink to a shared cache file owned by another user:


ERROR: unexpected error - [Errno 1] Operation not permitted: '<symlink_to_file>'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rshigapov/.local/lib/python2.7/site-packages/dvc/main.py", line 40, in main
    ret = cmd.run_cmd()
  File "/home/rshigapov/.local/lib/python2.7/site-packages/dvc/command/base.py", line 63, in run_cmd
    return self.run()
  File "/home/rshigapov/.local/lib/python2.7/site-packages/dvc/command/unprotect.py", line 17, in run
    self.repo.unprotect(target)
  File "/home/rshigapov/.local/lib/python2.7/site-packages/dvc/repo/__init__.py", line 124, in unprotect
    return self.cache.local.unprotect(PathInfo(target))
  File "/home/rshigapov/.local/lib/python2.7/site-packages/dvc/remote/local/__init__.py", line 498, in unprotect
    RemoteLOCAL._unprotect_file(path)
  File "/home/rshigapov/.local/lib/python2.7/site-packages/dvc/remote/local/__init__.py", line 473, in _unprotect_file
    remove(path)
  File "/home/rshigapov/.local/lib/python2.7/site-packages/dvc/utils/__init__.py", line 210, in remove
    _chmod(os.unlink, path, None)
  File "/home/rshigapov/.local/lib/python2.7/site-packages/dvc/utils/__init__.py", line 194, in _chmod
    os.chmod(p, perm)
OSError: [Errno 1] Operation not permitted: '<symlink_to_file>'

@efiop efiop deleted the marcelo branch August 10, 2019 23:18
@efiop
Copy link
Copy Markdown
Contributor Author

efiop commented Aug 10, 2019

@DXist Thanks for reporting this bug! Working on a fix right now. Mind creating an issue for it, please?

EDIT: no need now 🙂

efiop added a commit that referenced this pull request Aug 11, 2019
@efiop
Copy link
Copy Markdown
Contributor Author

efiop commented Aug 11, 2019

@DXist You are using dvc from master, right? 🙂 If so, the fix is already there, please give it a try.

@DXist
Copy link
Copy Markdown

DXist commented Aug 11, 2019

@efiop That fixed the problem and it's a good start to the day. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set correct permissions for users in a group automatically

4 participants