get/list/import/api: subrepo support#4247
Closed
skshetry wants to merge 11 commits into
Closed
Conversation
skshetry
commented
Jul 20, 2020
|
|
||
| @cached_property | ||
| def repo_tree(self): | ||
| return RepoTree(self.tree, [self], stream=True) |
Collaborator
Author
There was a problem hiding this comment.
We need granular configs in each of the tree operations. I could get around with stream_open() or just this ^ hack.
But, we need fetch and stream to be per-ops rather than per instance.
This comment has been minimized.
This comment has been minimized.
6939210 to
c7d1966
Compare
This comment has been minimized.
This comment has been minimized.
076e924 to
fa07554
Compare
pared
suggested changes
Jul 22, 2020
Contributor
There was a problem hiding this comment.
Shouldn't we use Repo.DVC_DIR?
fa07554 to
aafac24
Compare
One having a top-level dvc and another without it
932a300 to
0b73b5c
Compare
0b73b5c to
c594167
Compare
2 tasks
efiop
pushed a commit
to efiop/dvc
that referenced
this pull request
Jul 25, 2020
This allows us avoid collecting dvcignore for the whole repo if we only care about particular paths. As a result, in a repo with 2 datasets (2M + 0.5M files), creating a defunct stage takes ~4sec on 1.2.0, but ~1sec(most of it is actually dvc module initialization) with this PR. This is also a pre-requisite for dynamic dvcignore and subrepo collection (treeverse#4247) while walking the tree. Also, it is important to clarify that regular `dvc status`(without arguments) has the same performance after this PR, because when we check dataset for changes, we call things like `tree.exists()`, which call dvcignore and make it collect dvcignore in the dataset itself, so we still endup collecting dvcignore for the whole repo (including walking into the datasets). This should be solved soon by telling dvcignore that it shouldn't walk into the datasets searching for `.dvcignore`s.
efiop
added a commit
that referenced
this pull request
Jul 26, 2020
This allows us avoid collecting dvcignore for the whole repo if we only care about particular paths. As a result, in a repo with 2 datasets (2M + 0.5M files), creating a defunct stage takes ~4sec on 1.2.0, but ~1sec(most of it is actually dvc module initialization) with this PR. This is also a pre-requisite for dynamic dvcignore and subrepo collection (#4247) while walking the tree. Also, it is important to clarify that regular `dvc status`(without arguments) has the same performance after this PR, because when we check dataset for changes, we call things like `tree.exists()`, which call dvcignore and make it collect dvcignore in the dataset itself, so we still endup collecting dvcignore for the whole repo (including walking into the datasets). This should be solved soon by telling dvcignore that it shouldn't walk into the datasets searching for `.dvcignore`s.
pmrowla
reviewed
Jul 28, 2020
| # git-only erepo's do not need dvctree | ||
| self.dvctree = None | ||
| def __init__( | ||
| self, tree, subrepos=None, **kwargs |
Contributor
There was a problem hiding this comment.
It looks a bit strange to pass a list of subrepos into RepoTree like this. It seems like a RepoTree should be able to find nested subrepos by walking itself.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
This PR at this time only works with ExternalRepo (should
Repo()needs subrepo support, or should that be viaRepoTree? probably both).TODO:
subreposupport insideRepodvcx? Currently, it seems it's broken, well before this PR. And,dvcxdepends on git repo to have a top-leveldvcrepo, which this PR does not work with at all.Possible future works
RepoTree.stat()?Thank you for the contribution - we'll try to review it as soon as possible. 🙏