Run graph checks on collect/find_outs_by_path#5035
Conversation
We try to optimize `tree.exists` calls and probably few others in that they either look directly into the workspace or, to the cache without running graph checks. It does not seem to be possible just to run graph checks on `find_outs_by_path` due to those optimizations. So, that's why, the `collect` also does a graph check for this reason. Fixes treeverse#5027 Fixes treeverse#4010
| if not outs: | ||
| outs = [out for stage in self.stages for out in stage.outs] | ||
| # using `outs_graph` to ensure graph checks are run | ||
| outs = outs or self.outs_graph |
There was a problem hiding this comment.
The disadvantage of it might be that, for example, dvc.api.open might start giving these unwanted errors, if their graphs are not correct.
Maybe, instead of giving these errors at all the times, we should only error out if n(outs) > 1 in the RepoTree?
There was a problem hiding this comment.
But maybe it's not the tree that needs to worry about this.
There was a problem hiding this comment.
There are lots of assumptions around the code about our outs/stages being in-check with the proper DAG, so we indeed need to check the graph. There were some discussions around whether or not some of the dag checks are really that necessary (e.g. overlapping outputs might be used to dvc checkout particular versions of datasets on demand), but so far there wasn't a good scenario that people were actively asking for.
RepoDependency for example don't have any path_info See: treeverse#4938 (comment) Related: treeverse#5035
RepoDependency for example don't have any path_info See: #4938 (comment) Related: #5035
We try to optimize
tree.existscalls and probably a few othersin that, they either look directly into the workspace or,
to the cache without running graph checks. It does not seem
to be possible just to run graph checks on
find_outs_by_pathdue to those optimizations.
So, that's why the
collectalso does a graph check for thisreason.
Fixes #5027
Fixes #4010
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏