Skip to content

dvcignore: don't walk directories twice#5098

Merged
efiop merged 1 commit into
treeverse:masterfrom
efiop:external_repo
Dec 14, 2020
Merged

dvcignore: don't walk directories twice#5098
efiop merged 1 commit into
treeverse:masterfrom
efiop:external_repo

Conversation

@efiop
Copy link
Copy Markdown
Contributor

@efiop efiop commented Dec 14, 2020

Prevents us from duplicating the work by walking into directories
searching for subrepos. Saves around ~1sec (5.8 -> 4.8) in
dvc metrics diff in a big git-only repo. And around ~2sec (9.7 -> 7.7)
for dvc status in a repo with big undvcignored dirs.

Related to #4284 (comment)

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@efiop efiop added the performance improvement over resource / time consuming tasks label Dec 14, 2020
@efiop efiop changed the title dvcignore: don't look forward [WIP] dvcignore: don't look forward Dec 14, 2020
@efiop efiop changed the title [WIP] dvcignore: don't look forward [WIP] dvcignore: don't walk directories twice Dec 14, 2020
Prevents us from duplicating the work by walking into directories
searching for subrepos. Saves around ~1sec (5.8 -> 4.8) in
`dvc metrics diff` in a big git-only repo.

Related to treeverse#4284 (comment)
Comment thread dvc/ignore.py

ignore_pattern = self._get_trie_pattern(root)
if ignore_pattern:
dirs, files = ignore_pattern(root, dirs, files)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dirs, files = ignore_pattern(root, dirs, files)
if not ignore_subrepos:
    dirs.extend(self._ignored_subrepos.get(root, []))

This is a bit confusing, but I've decided not to touch this just yet.

@efiop efiop changed the title [WIP] dvcignore: don't walk directories twice dvcignore: don't walk directories twice Dec 14, 2020
@efiop efiop merged commit b0b04d4 into treeverse:master Dec 14, 2020
@efiop efiop self-assigned this Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance improvement over resource / time consuming tasks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants