Skip to content

dvc: introduce merge-driver#4298

Merged
efiop merged 1 commit into
treeverse:masterfrom
efiop:fix-4162
Aug 19, 2020
Merged

dvc: introduce merge-driver#4298
efiop merged 1 commit into
treeverse:masterfrom
efiop:fix-4162

Conversation

@efiop
Copy link
Copy Markdown
Contributor

@efiop efiop commented Jul 28, 2020

Related to #4162

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

TODO:

  • more tests

@efiop efiop force-pushed the fix-4162 branch 6 times, most recently from 81f360e to aabf1d7 Compare July 29, 2020 22:54
efiop pushed a commit to treeverse/dvc.org that referenced this pull request Aug 10, 2020
efiop pushed a commit to treeverse/dvc.org that referenced this pull request Aug 10, 2020
@efiop efiop force-pushed the fix-4162 branch 2 times, most recently from 9dbceb7 to 346f169 Compare August 10, 2020 17:59
@efiop efiop changed the title [WIP] dvc: introduce merge-driver dvc: introduce merge-driver Aug 11, 2020
@efiop efiop requested review from pared, pmrowla and skshetry and removed request for skshetry August 11, 2020 08:08
Comment thread dvc/dvcfile.py Outdated
Comment on lines 262 to 259
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These stubs are for the future, as merging dvcfiles is potentially a handy thing to have.

Comment thread dvc/cache/base.py Outdated
Comment on lines 604 to 605
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar thing in _collect_dir in BaseTree's, but these will be unified as a part of #4144 , once dir hash computation is detached from cache.

@efiop efiop changed the title dvc: introduce merge-driver [WIP] dvc: introduce merge-driver Aug 11, 2020
efiop pushed a commit to treeverse/dvc.org that referenced this pull request Aug 11, 2020
@pared
Copy link
Copy Markdown
Contributor

pared commented Aug 12, 2020

@efiop
What about conflicting files? I prepared sample script:

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo

git init --quiet
dvc init --quiet

git add -A
git commit -am "init"

dvc install
echo "*.dvc merge=dvc" >> .gitattributes
git add -A
git commit -m "setup hooks"

git checkout -b one
mkdir data
echo foo >> data/foo

dvc add data
git add -A
git commit -m "add foo"

git checkout master

git checkout -b two
mkdir data
echo bar >> data/foo

dvc add data
git add -A
git commit -m "add foo but bar"

git checkout master

git merge one --no-edit
git merge two --no-edit
# I think we should get error now

Where I have 2 branches:
one that has dvc add-ed data/foo with foo content
two that has dvc add-ed data/foo with bar content

and then, on master I merge one and then two.
My data.dvc gets silently overridden so that data/foo contains bar - the two branch content.
Shouldn't we get some error?

@efiop
Copy link
Copy Markdown
Contributor Author

efiop commented Aug 15, 2020

@pared Sorry for the delay.

git checkout master
git merge one --no-edit
# I think we should get error now

Not really, because there is nothing in master, so there is no conflict.

git merge two --no-edit

And here you get a conflict, as expected.

@pared
Copy link
Copy Markdown
Contributor

pared commented Aug 17, 2020

@efiop sorry, I meant that I would expect error after the comment.
And yes, there is conflict, but my .dvc file gets Auto-merged. It is some indicator of failure, but it is easy to omit. I would expect that git will not try to auto-merge. Choosing which file is the proper one should be users responsibility.
Now the content of data/foo depends on the order of merge:

git merge one --no-edit
git merge two --no-edit
# cat data/foo == bar
git merge two --no-edit
git merge one --no-edit
# cat data/foo == foo

If we were doing same thing (tracking data/foo) in git, we would get unresolved conflict.

EDIT:
git equivalent:

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo

git init --quiet
echo other_file >> other_file
git add other_file

git add -A
git commit -am "init"

git checkout -b one
mkdir data
echo foo >> data/foo

git add data
git add -A
git commit -m "add foo"

git checkout master

git checkout -b two
mkdir data
echo bar >> data/foo

git add data
git add -A
git commit -m "add foo but bar"

git checkout master

git merge two --no-edit
git merge one --no-edit
#merge conflict

@efiop
Copy link
Copy Markdown
Contributor Author

efiop commented Aug 18, 2020

@pared Ah, got it. Yes, that's a bug, thank you! 🙏 Finally getting to creating additional tests for it, will be sure to add one for that as well.

@efiop efiop changed the title [WIP] dvc: introduce merge-driver dvc: introduce merge-driver Aug 18, 2020
@efiop efiop merged commit ac24b5c into treeverse:master Aug 19, 2020
@efiop efiop deleted the fix-4162 branch August 19, 2020 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants