Skip to content

sftp: make move across drives possible#2709

Merged
efiop merged 4 commits into
masterfrom
unknown repository
Nov 2, 2019
Merged

sftp: make move across drives possible#2709
efiop merged 4 commits into
masterfrom
unknown repository

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Nov 1, 2019

Fix #2704

Comment thread dvc/remote/ssh/connection.py Outdated
Comment thread dvc/remote/ssh/connection.py
Copy link
Copy Markdown
Contributor

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments + can we add a test for this please?

@ghost ghost dismissed shcheklein’s stale review November 1, 2019 19:18

thanks for the review, @shcheklein, instead of fallback to an atomic operation, now always moves in two steps

@ghost
Copy link
Copy Markdown
Author

ghost commented Nov 1, 2019

Not sure how to mock across drives operations, tho 🤔
But I've test it manually

Comment thread dvc/remote/ssh/connection.py Outdated
@ghost
Copy link
Copy Markdown
Author

ghost commented Nov 1, 2019

I think that I should return to the previous implementation, fall back only when rename fails, otherwise copying would make the process slower 😞

Comment thread dvc/remote/ssh/connection.py Outdated
self.makedirs(posixpath.dirname(dst))
self.sftp.rename(src, dst)

tmp = tmp_fname(dst)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there something like with tmp ... to make sure we cleanup the stuff if there is some failure in the middle

Copy link
Copy Markdown
Author

@ghost ghost Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein , we can do finally on the try/except block 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I was just thinking that there might be something like with tmp already - since it makes a lot of sense and simplifies + deduplicates the code ... but finally works fine either

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein , but what's the point of doing a two step operation if we are always removing the tmp? Isn't it useful to rollback the src when deleted or at least leave the tmp?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I got the question, @MrOutis . My only point is that if something fails in the middle we don't want to stuck in some weird state (thus atomicity), and we don't want to leave some garbage after us that won't be controlled by anyone and can consume a lot of space (thus with tmp ... or finally or whatnot). with smt is just a way of achieving the second requirement

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein would you mind taking a look?

self.sftp.rename(src, dst)

try:
self.sftp.rename(src, dst)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you 100% sure that rename is never copying data? just to double-check

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the man page correctly, POSIX's rename is atomic: http://man7.org/linux/man-pages/man2/rename.2.html

But it turns out SFTP doesn't follow POSIX standard, there's an specific call sftp.posix_rename for that, it is implemented as an OpenSSH extension (widely used) so I guess it is fine to use it :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was more asking about it failing if you try to rename from one mounted point to another - is is part of the standard as it is for regular posix rename

re atomicity - yeah, it looks like sftp rename is not atomic, but in this sense our own version with tmp file would not be better, so it's probably safe to use it 👍

Copy link
Copy Markdown
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@efiop efiop merged commit 2e6485f into treeverse:master Nov 2, 2019
@ghost ghost deleted the fix-2704 branch November 2, 2019 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected error when cache and artifacts are not on the same partition on SSH

2 participants