Skip to content

tests: mark test_data_cloud::test_pull_git_imports as flaky#3790

Merged
pmrowla merged 1 commit into
treeverse:masterfrom
pmrowla:3570-test-pull-git-imports
May 13, 2020
Merged

tests: mark test_data_cloud::test_pull_git_imports as flaky#3790
pmrowla merged 1 commit into
treeverse:masterfrom
pmrowla:3570-test-pull-git-imports

Conversation

@pmrowla
Copy link
Copy Markdown
Contributor

@pmrowla pmrowla commented May 13, 2020

Fixes #3570.

  • ❗ I have followed the Contributing to DVC checklist.

  • 📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

As far as I can tell there is nothing wrong with the test as written. I think the issue is related to inodes sometimes being re-used when we delete files in between pull calls.

From the logs, inode 2307858 is reused after the file deletion calls. Note that the test failure is due to foo not being re-fetched during the second pull call after the deletions.

...
DEBUG    dvc.utils.fs:fs.py:32 Path 'new_dir/bar' inode '2307858'
...'
DEBUG    dvc.utils.fs:fs.py:32 Path '.dvc/cache/37/b51d194a7513e45b56f6524f2d51f2' inode '2307858'
...
DEBUG    dvc.utils.fs:fs.py:133 Removing 'foo'
DEBUG    dvc.utils.fs:fs.py:133 Removing 'new_dir'
DEBUG    dvc.utils.fs:fs.py:133 Removing '/tmp/pytest-of-travis/pytest-2/test_pull_git_imports_git_dir_0/.dvc/cache'
DEBUG    dvc.utils.fs:fs.py:133 Removing '/tmp/tmprd0s356zdvc-clone'
DEBUG    dvc.utils.fs:fs.py:133 Removing '/tmp/tmp5ho05lfidvc-erepo'
DEBUG    dvc.utils.fs:fs.py:133 Removing '/tmp/tmp2wni80trdvc-erepo'
...
DEBUG    dvc.utils.fs:fs.py:32 Path '../../../tmp7v6m5iztdvc-erepo/foo' inode '2307858'
...

For the second pull call, what we do is:

  1. try to pull dir into new_dir
    • this creates new cache/state entry for the re-fetched new_dir/bar
  2. try to pull foo
    1. state looks up inode for the new foo, which coincidentally matches the inode for the deleted new_dir/bar
    2. the state db entry for deleted new_dir/bar (inode 2307858) still exists
    3. state returns checksum for deleted new_dir/bar (but thinks its the checksum for new foo)
    4. we see that checksum for new_dir/bar is already in cache (since it was fetched in step 1)
    5. we do a bunch of other things assuming that the stale state db entries for inode 2307858 are still correct, and eventually fail to fetch/checkout foo

So I think this may actually be a minor state bug, although I'm not sure how serious it is given that manually deleting your .dvc/cache directory (but not state db) in this way is not exactly a typical use case.

@pmrowla pmrowla requested review from efiop, pared and skshetry May 13, 2020 06:39
@pmrowla
Copy link
Copy Markdown
Contributor Author

pmrowla commented May 13, 2020

Also kind of interesting that this hasn't show up before in other tests where we are deleting the cache dir and files between dvc command runs

@pmrowla pmrowla self-assigned this May 13, 2020
Copy link
Copy Markdown
Collaborator

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, great findings @pmrowla. Were you able to locate reproduce locally?

Also kind of interesting that this hasn't show up before in other tests where we are deleting the cache dir and files between dvc command runs

AFAIK, we also check for mtime and size before returning checksum. Maybe the "mtime" also matched here?

@pmrowla
Copy link
Copy Markdown
Contributor Author

pmrowla commented May 13, 2020

Wow, great findings @pmrowla. Were you able to locate reproduce locally?

Also kind of interesting that this hasn't show up before in other tests where we are deleting the cache dir and files between dvc command runs

AFAIK, we also check for mtime and size before returning checksum. Maybe the "mtime" also matched here?

I was not able to reproduce it locally - I'm not even sure what the best way to go about reproducing it would be? Maybe something like setting up a VM and manually creating a disk partition with a really low max inode count to force re-use of inodes on file deletion?

@pmrowla pmrowla merged commit 3916d4f into treeverse:master May 13, 2020
@pmrowla pmrowla deleted the 3570-test-pull-git-imports branch May 13, 2020 15:16
@efiop
Copy link
Copy Markdown
Contributor

efiop commented May 13, 2020

@pmrowla Great research! We used to have similar problems with inodes back when we had hardlinks by default, good times 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: test_pull_git_imports is flaky

4 participants