RepoTree: work with dirty repos#4005
Conversation
|
The failing test might be related to #3991 CC @pmrowla https://travis-ci.com/github/iterative/dvc/jobs/346909743#L8052 |
There was a problem hiding this comment.
This is what has been causing that test failure. We rely on a side-effect here, that might not work if the tree is dirty. So now we try to open first, which might download the cache, and then check if we indeed have the cache now and if not - try to save it to cache. CC @pmrowla
There was a problem hiding this comment.
This looks kind of strange now, since we're calling changed_cache twice, before and after tree.open
There was a problem hiding this comment.
I think that relying on this side effect the way it was originally written is correct, since we should only be calling open specifically when we need to download the cache file (which was checked when we call changed_cache before calling open)
With your changes to DvcTree/RepoTree, but without this change to _save_file, tests/func/test_update.py::test_update_import_after_remote_updates_to_dvc fails, but I think that is a bug in the test.
If I make this change in that test:
with erepo_dir.branch("branch", new=False), erepo_dir.chdir():
erepo_dir.scm.repo.index.remove(["version"])
erepo_dir.dvc_gen("version", "updated")
- erepo_dir.scm.add(["version", "version.dvc"])
+ erepo_dir.scm.add(["version.dvc"])
erepo_dir.scm.commit("upgrade to DVC tracking")
new_rev = erepo_dir.scm.get_rev()it passes with the original _save_file implementation (but with your changes to prefer dirty version in DvcTree).
I think the issue is that scm.add() calls git.index.add(<path>) directly, which is not the same as git add <path>. It looks like git.index.add doesn't respect gitignores, so the original test actually makes both version and version.dvc into git-tracked files.
In that test, when we open("version") to copy/fetch it into cache, repo.tree.exists("version") should be false, since it's a DVC out from our clean erepo clone tree, which does not have any dirty/modified copy of version at all. version.dvc should exist in the git tree, but not .gitignored DVC out file version.
I think the issue w/scm.add() not respecting .gitignore can be verified with
def test_gittree_add_ignore(tmp_dir, scm):
tmp_dir.gen({"foo": "foo", ".gitignore": "foo"})
with tmp_dir.chdir():
scm.add(["foo", ".gitignore"])
scm.commit("init")
tree = scm.get_tree("HEAD")
assert not tree.exists(tmp_dir / "foo")(test will fail in master)
There was a problem hiding this comment.
Great explanation! You are absolutely right, we don't have to touch that line for that particular test right now.
The double if changed_cache is just our way of verifying the side-effect to tell if the cache was actually downloaded and there is nothing we need to do here.
Also, it seems like wiith the working tree we don't really go through that branch of execution (because of the if tree), so my change might not even affect it. So that also might be something that is just not abstracted yet.
I'll remove that change in favor of fixing the test itself, and will keep it in mind for the future.
Thanks!
56378fc to
7569772
Compare
pmrowla
left a comment
There was a problem hiding this comment.
see comments on remote/base.py
Currently `RepoTree.open` will throw FileNotFoundError if you have a file that is tracked by dvc, but which has no md5 or a cache for which is missing, even though you have the file itself in your workspace. This corresponds very nicely to the `--no-commit` or `--no-exec` scenario, where you might not want to commit your artifacts to cache just yet. Fixes treeverse#3974
| new_rev = None | ||
| with erepo_dir.branch("branch", new=False), erepo_dir.chdir(): | ||
| erepo_dir.scm.repo.index.remove(["version"]) | ||
| erepo_dir.dvc_gen("version", "updated") |
Reverts an issue caused by treeverse#4005
Reverts an issue caused by #4005
Currently
RepoTree.openwill throw FileNotFoundError if you have afile that is tracked by dvc, but which has no md5 or a cache for which
is missing, even though you have the file itself in your workspace.
This corresponds very nicely to the
--no-commitor--no-execscenario, where you might not want to commit your artifacts to cache
just yet.
Fixes #3974
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.
❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. 🙏