Skip to content

erepo: fix relative url on default remote (#2756)#3378

Merged
efiop merged 12 commits into
treeverse:masterfrom
tizoc:fix-2756
Feb 26, 2020
Merged

erepo: fix relative url on default remote (#2756)#3378
efiop merged 12 commits into
treeverse:masterfrom
tizoc:fix-2756

Conversation

@tizoc
Copy link
Copy Markdown
Contributor

@tizoc tizoc commented Feb 21, 2020

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • 📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Pending

  • figure out how to write a functional test for this

@tizoc
Copy link
Copy Markdown
Contributor Author

tizoc commented Feb 21, 2020

@pared @efiop added a test, I'm not entirely sure about it.

What I tried to do there is to reproduce the script in the description of the issue, but maybe that is too much and it would be better to just check that the path is absolute and that it exists? not sure. I will wait for your input on that.

@tizoc
Copy link
Copy Markdown
Contributor Author

tizoc commented Feb 21, 2020

Note that this change only modifies the default remote, should all remotes with relative urls be taken care of instead?

Comment thread dvc/external_repo.py Outdated
Comment thread dvc/external_repo.py Outdated
Comment thread dvc/external_repo.py Outdated
Comment thread dvc/external_repo.py Outdated
Copy link
Copy Markdown
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @tizoc ! 🙏 Looks good! Please see a few comments above.

@tizoc
Copy link
Copy Markdown
Contributor Author

tizoc commented Feb 23, 2020

Thanks for the feedback everyone. Just pushed a new version based on your comments.

Branching has been moved to it's own function (_fix_upstream, wraps the other two), which then calls _add_upstream or _fix_local_remote (not "upstream" because it can be any remote passed as argument) depending on what is needed.

If other local relative remotes happen to need handling, _fix_local_remote can be used as-is.

Comment thread dvc/external_repo.py Outdated
Copy link
Copy Markdown
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good, left few comments.

Comment thread tests/func/test_external_repo.py Outdated
Comment thread tests/func/test_external_repo.py Outdated
Comment thread tests/func/test_external_repo.py
Comment thread dvc/external_repo.py Outdated
@tizoc
Copy link
Copy Markdown
Contributor Author

tizoc commented Feb 24, 2020

@pared thank you! just made the changes you requested.

Copy link
Copy Markdown
Contributor

@gurobokum gurobokum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, @tizoc! I've shared minor suggestions about improving readability

Comment thread dvc/external_repo.py
Comment thread dvc/external_repo.py Outdated
"url": cache_dir
}
self.config["core"]["remote"] = "auto-generated-upstream"
upstream_name = self.config["core"].get("remote")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote_name is better name then upstream_name IMHO cause it is used in _fix_local_remote and more intuitive

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this one isn't any remote, right? what is being checked here is the default remote, not the other remotes (if any). Anyway, both names work for me, let me know what you think.

Comment thread dvc/external_repo.py Outdated
src_repo.close()

def _fix_local_remote(self, src_repo, remote_name):
# Keep the relative path for a local remote relative
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is not relevant:

  1. Cause the paths are absolute as Config.RelPath
  2. There are no any conversions of the paths

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess the wording is not correct in that comment. The path is absolute, but it was built from a relative URL, and thats why it is being changed with the path from the source repo.

What do you think of making the test a check to see if the path is relative instead?

if isinstance(new_remote["url"], RelPath):
    ...

This makes it explicit that what we care about there is paths with urls relative to the repository, with no more clarification required.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tizoc maybe if not os.path.isabs(new_url) then? It would work with strings and pathlib objects. Just worried about this being easilly broken with some hand-written dict config in tests or elsewhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I had not thought about that. But you are right, if not os.path.isabs(new_url) would work the same and without those possible breakages. The benefit of checking for RelPath instances was only to better express intent.

Comment thread dvc/external_repo.py Outdated
upstream_name = self.config["core"].get("remote")
src_repo = Repo(self.url)
try:
if upstream_name:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try:
   self._add_upstream(src_repo)
...

and move the logic of resolving upstream into the _add_upstream method. Cause here the upstream is added, what type of it and how - can be handled inside the method
If you want to use conditions here then methods can be named differently, for example as

try:
    if remote_name:
      self._add_upstream_from_remote(..) # not sure in relevance of `add_upstream`
   else:
      self._add_upstream_from_repo(...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that maybe the two separate flows are not required? Consider my question here #3378 (comment)

If instead of going one way or another depending on if there is a default remote set, I go through every remote and replace every RelPath with the one from the source repo, and then I check if there is a default repo set up (and handle that accordingly) there will be a single branch of execution.

Something like this:

    def _fix_remotes(self):
        if not os.path.isdir(self.url):
            return

        src_repo = Repo(self.url)
        try:
            self._fix_source_relative_remotes(src_repo)
            if "remote" not in self.config["core"]:
                self._set_upstream(src_repo)
        finally:
            src_repo.close()

    def _fix_source_relative_remotes(self, src_repo):
        for remote_name, new_remote in self.config["remote"].items():
            if isinstance(new_remote["url"], RelPath):
                old_remote = src_repo.config["remote"][remote_name]
                new_remote["url"] = old_remote["url"]

/cc @efiop

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename _set_upstream to _set_upstream_to_repo (also, the comment there is probably not needed anymore, it is describing what the three lines of code in that function already express clearly).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tizoc I think two branches of execution are necessary, since _set_upstream is only needed when there is no default remote and _fix_upstream is only needed when there is a default remote but it is a local one with relpath url.

@JIoJIaJIu So you mean moving

                if upstream_name:
                    self._fix_local_remote(src_repo, upstream_name)
                else:
                    self._add_upstream(src_repo)

into a new method? Well, we could, but I don't see much value in doing that, tbh. I would be fine with the current approach as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop my reasoning is, if relative urls are imported from the source repo, we want all of them to be kept relative to the source repo, right? not just the one that happens to be the default. If that is the case, that is independent from if there is a default remote set or not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tizoc But we don't really care about other remotes in this case. We only care about default remote, as it is the only thing that we are actually going to use, at least for now.

@tizoc
Copy link
Copy Markdown
Contributor Author

tizoc commented Feb 26, 2020

I just pushed the requested changes.

I have not included the change I proposed in #3378 (comment)

If you think doing that would be an improvement I will commit that too (but using os.path.isabs instead of checking for RelPath as per #3378 (comment) )

Copy link
Copy Markdown
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread dvc/external_repo.py
Comment on lines +143 to +145
if new_remote["url"] != old_remote["url"]:
new_remote["url"] = old_remote["url"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would at least add a comment about what is going on here, though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Btw, should I squash all commits into a single one?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you wish. But we'll squash automatically anyways 🙂 Though if you think this is a final patch, you need to remove WIP prefix from the header.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool! no need then. "WIP" removed.

@tizoc tizoc changed the title [WIP] erepo: fix relative urls for remotes (#2756) erepo: fix relative url on default remote (#2756) Feb 26, 2020
@efiop efiop merged commit b136eb5 into treeverse:master Feb 26, 2020
@efiop
Copy link
Copy Markdown
Contributor

efiop commented Feb 26, 2020

Thanks @tizoc ! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants