api: refactor to use metadata and RepoTree#4458
Conversation
There was a problem hiding this comment.
| tree = RepoTree(_repo, fetch=True) | |
| tree = DvcTree(_repo, fetch=True) |
There was a problem hiding this comment.
DvcTree is an internal API, and does not support subrepos.
There was a problem hiding this comment.
Okay, I think I get why you are suggesting that. As DvcTree does not support subrepos, we need to implement RepoTree.get_dvctree(path) or RepoTree.get_repo(repo_path), as get_hash also hashes git-files.
There was a problem hiding this comment.
Or, get_hash(dvc_only=True)? Thoughts?
There was a problem hiding this comment.
Yeah, I suggest it because this is not meant to work with git files. Not a big fan of dvc_only=True though, but we could simply do:
if not tree.isdvc(path):
raise ....
hash = tree.get_hash()
🙂
There was a problem hiding this comment.
Theoretically we could've made hash part of the metadata, but that might make it complicated for files inside dirs, as you'll need to parse the dir_cache (as we do in _get_granular_checksum, which is not that bad)
There was a problem hiding this comment.
Btw, this will automatically close #3182 for files in the directory 🙂
There was a problem hiding this comment.
Okay, did as you recommended, but metadata/hash/RepoTree all gets mixed. Feels like that thing could be a part of RepoTree on itself.
There was a problem hiding this comment.
Actually no need for this, we could just use tree.get_hash(PathInfo(_repo.root_dir) / path) directly
There was a problem hiding this comment.
Doesn't look like there is a need for this change. Repo.open_by_relpath already does all of this.
There was a problem hiding this comment.
open_by_relpath will most likely go away, as RepoTree supports subrepos directly.
There was a problem hiding this comment.
@skshetry Agreed, it was only used for the API, so we could delete it in that case.
open_by_relpath also has some weird exceptions that need to be double checked. Maybe not worth messing with this right now, your call.
There was a problem hiding this comment.
Exceptions do not look good, I do have to tell you that because Repo and ExternalRepo threw different exception before, but now, I am throwing PathMissingError. Unified, but quite verbose.
This does not yet support subrepos as it requires us to work on setting up cache correctly which is pending.
| tmp_dir.scm_gen({"foo": "foo"}, commit="initial") | ||
|
|
||
| with pytest.raises(UrlNotDvcRepoError, match="not a DVC repository"): | ||
| with pytest.raises(OutputNotFoundError, match="with output 'foo'"): |
There was a problem hiding this comment.
This is changing the API, we shouldn't do it in this PR if we can avoid it.
There was a problem hiding this comment.
Plus OutputNotFoundError is an internal thing, that shouldn't be exposed to api users.
There was a problem hiding this comment.
We do have to break things here as it's not precise enough. OutputNotFoundError is more of a correct term here.
There was a problem hiding this comment.
Also, we already threw OutputNotFoundError. It was just not documented. The following threw OutputNotFoundError.
There was a problem hiding this comment.
@skshetry I think that one was caught by https://github.com/iterative/dvc/blob/master/dvc/external_repo.py#L51 and re-raised properly.
There was a problem hiding this comment.
Ahh missed that. So, we could raise the same exc here?
There was a problem hiding this comment.
@skshetry But this particular test doesn't catch that one, but rather UrlNotDvcRepoError and it should stay that way. The issue is that we are using RepoTree instead of DvcTree right now, which works for git-only repos, when it shouldn't. Could do something simple like if not tree.dvc_tree: raise UrlNotDvcRepoError though, if subrepos are a concern.
| single_stage=True, | ||
| metrics_no_cache=[metric_file], | ||
| cmd=(f'python -c "{metric_code}"'), | ||
| cmd=f'python -c "{metric_code}"', |
|
The suggestions cannot be applied without complete refactoring of the |
|
Looks good, though I would also say that |
|
sorry @skshetry, clicked in close and comment |
|
Closing as #4465 has the same changes. |
This does not yet support subrepos as it requires us to work on setting up cache correctly which is pending.
After that, it should just be changing
subrepos=FalsetoTrue.Still unsure about proper exceptions here, but I did try to unify them somehow (still does not look good though).
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
Closes #3182