Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

repo-updater: Use URI field as fallback for repos.GetByName#3922

Merged
keegancsmith merged 9 commits intomasterfrom
core/name-alias
May 13, 2019
Merged

repo-updater: Use URI field as fallback for repos.GetByName#3922
keegancsmith merged 9 commits intomasterfrom
core/name-alias

Conversation

@keegancsmith
Copy link
Member

External tools (such as the browser extension, language extensions, editor
extensions) compute the Sourcegraph repository name using the repository's
hostname and path. For example github.com/gorilla/mux. However, if an
administrator configures an external service with the non-default
repositoryPathPattern the name on Sourcegraph the name of a repository on
Sourcegraph will not match with the externally computed name.

This commit will store the default name in the unused URI field on the repo
table. When the frontend is looking up a repository by name, it will now
fallback to the default name (the URI column) to find a repository. This
magically makes most external tools just work. Additionally the frontend has
redirect logic where if a repository path does not equal its name, it is
redirected.

There are cases where extension relies on the name to be the default name. For
example the go language server uses the name as the default root go package
path. To fix this use-case we will need to extend the extension to pass on the
URI field of the repo. However, things like fetching the file contents will
still work, so if the root path is found in other ways (such as parsing a
config file or import package doc) it will still work.

The initial implementation of this feature returned errors indicating the
correct name to use. However, that would require every client to handle that
case. Instead this implementation works as expected, and in a few cases
requires some clients to support this model. Note: the clients that need
updating are no more broken than before, and generally work better now.

The URI field was picked since this is what its original value
was. Additionally it is an unused field. I considered giving it a new name,
but prefered consistency between the name in the database and the name in the
code.

Test plan: Tested locally. Once on dogfood we will do plenty of follow up testing to ensure extensions and other integrations work.

Part of https://github.com/sourcegraph/sourcegraph/issues/462

@keegancsmith
Copy link
Member Author

@chrismwendt see my comment on go-langserver in the description. Does that make sense?

External tools (such as the browser extension, language extensions, editor
extensions) compute the Sourcegraph repository name using the repository's
hostname and path. For example `github.com/gorilla/mux`. However, if an
administrator configures an external service with the non-default
`repositoryPathPattern` the name on Sourcegraph the name of a repository on
Sourcegraph will not match with the externally computed name.

This commit will store the default name in the unused URI field on the repo
table. When the frontend is looking up a repository by name, it will now
fallback to the default name (the URI column) to find a repository. This
magically makes most external tools just work. Additionally the frontend has
redirect logic where if a repository path does not equal its name, it is
redirected.

There are cases where extension relies on the name to be the default name. For
example the go language server uses the name as the default root go package
path. To fix this use-case we will need to extend the extension to pass on the
`URI` field of the repo. However, things like fetching the file contents will
still work, so if the root path is found in other ways (such as parsing a
config file or import package doc) it will still work.

The initial implementation of this feature returned errors indicating the
correct name to use. However, that would require every client to handle that
case. Instead this implementation works as expected, and in a few cases
requires some clients to support this model. Note: the clients that need
updating are no more broken than before, and generally work better now.

The URI field was picked since this is what its original value
was. Additionally it is an unused field. I considered giving it a new name,
but prefered consistency between the name in the database and the name in the
code.
@codecov
Copy link

codecov bot commented May 10, 2019

Codecov Report

Merging #3922 into master will increase coverage by 0.11%.
The diff coverage is 84.37%.

Impacted Files Coverage Δ
cmd/repo-updater/repos/types.go 76.13% <100%> (+0.35%) ⬆️
cmd/repo-updater/repos/store.go 85.57% <100%> (+0.13%) ⬆️
cmd/frontend/db/repos.go 67.8% <66.66%> (-0.41%) ⬇️
cmd/repo-updater/repos/sources.go 83.08% <88.88%> (+0.19%) ⬆️
cmd/frontend/backend/repos.go 31.68% <0%> (+2.97%) ⬆️
cmd/frontend/backend/repos_vcs.go 29.66% <0%> (+7.62%) ⬆️
cmd/frontend/backend/go_importers.go 59.72% <0%> (+37.5%) ⬆️

Copy link
Contributor

@mrnugget mrnugget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Straightforward and easy to understand solution, I like it 👍

That being said, this is easy to understand on a technical level, but I had some troubles with the naming. Am I correct in assuming that the URI field contains something like the "original path" of a repository? (I also had to lookup again whether URIs contain a scheme or not and apparently I'm not the only one confused by the RFC 😄 )

I left two suggestions for changes and I also agree with @tsenart that we should have some tests for this, especially since tests would showcase what a URI looks like for different code hosts

@tsenart
Copy link
Contributor

tsenart commented May 13, 2019

Can you clarify what you mean by this? Did you maybe mean organization/project instead of "host"?

@mrnugget: What I looked at was https://github.com/sourcegraph/sourcegraph/blob/master/cmd/repo-updater/repos/testdata/sources/AWSCODECOMMIT/yielded-repos-have-authenticated-CloneURLs.yaml#L51 and https://github.com/sourcegraph/sourcegraph/blob/1c327a778408796499b6c89d11ea4d36b1fe4eca/pkg/conf/reposource/awscodecommit.go#L29:6

The names are bare names as far as I can see, without any hostname (e.g stripe-go, no github.com/stripe/stripe-go)

@keegancsmith
Copy link
Member Author

Am I correct in assuming that the URI field contains something like the "original path" of a repository? (I also had to lookup again whether URIs contain a scheme or not and apparently I'm not the only one confused by the RFC 😄 )

URIs indeed is a bad name for this. However, we are using it for convenience since our repo table already had this column (which was also used incorrectly). I somewhat explained this in the description. I'm happy to add a DB migration adding a column and giving it a better name if you think that is appropriate. How about full_name or canonical_name or normal_name (normal as defined in mathematics)?

@sourcegraph sourcegraph deleted a comment from todo bot May 13, 2019
@sourcegraph sourcegraph deleted a comment from todo bot May 13, 2019
@sourcegraph sourcegraph deleted a comment from todo bot May 13, 2019
@keegancsmith keegancsmith requested a review from mrnugget May 13, 2019 10:15
@mrnugget
Copy link
Contributor

URIs indeed is a bad name for this. However, we are using it for convenience since our repo table already had this column (which was also used incorrectly). I somewhat explained this in the description.

After reading up on URIs vs. URLs again I actually thought that URI is the correct name for it, in the sense of "this is the the unique resource identifier, according to the code host".

I'm happy to add a DB migration adding a column and giving it a better name if you think that is appropriate. How about full_name or canonical_name or normal_name (normal as defined in mathematics)?

I think "canonical name" is a great name. Does it need a migration? I don't know, you decide :) As a newcomer to the codebase and somewhat to the domain knowledge encoded in repo-updater I'm hesitant to decide between "code could be clearer" and "I just don't know the whole thing yet"

@tsenart
Copy link
Contributor

tsenart commented May 13, 2019

I think that URI is a good name for what it should represent.

@todo
Copy link

todo bot commented May 13, 2019

(keegancsmith) Cleanup bitbucket code such that we don't need to

https://github.com/sourcegraph/sourcegraph/blob/aac81413c7d0935ff4e5ef83b839dc0bbeebe796/cmd/repo-updater/repos/sources.go#L424-L429


This comment was generated by todo based on a TODO comment in aac8141 in #3922. cc @sourcegraph.

@chrismwendt
Copy link
Contributor

There are cases where extension relies on the name to be the default name. For
example the go language server uses the name as the default root go package
path. To fix this use-case we will need to extend the extension to pass on the
URI field of the repo.

The Go extension or go-langserver aren't broken because of this, are they?

Currently, both the Go extension and go-langserver assume the Sourcegraph repository name is the same as the host+path of the clone URL (and go-langserver gets file contents from the Sourcegraph raw API). It doesn't seem like that use-case is supported. When you say "To fix this", does that mean "in order to support this in the future"?

@keegancsmith
Copy link
Member Author

There are cases where extension relies on the name to be the default name. For
example the go language server uses the name as the default root go package
path. To fix this use-case we will need to extend the extension to pass on the
URI field of the repo.

The Go extension or go-langserver aren't broken because of this, are they?

Less broken than before. We have always supported repositoryPathPattern, but if an admin used it the extensions would often break. Now they will work in more cases.

Currently, both the Go extension and go-langserver assume the Sourcegraph repository name is the same as the host+path of the clone URL (and go-langserver gets file contents from the Sourcegraph raw API). It doesn't seem like that use-case is supported. When you say "To fix this", does that mean "in order to support this in the future"?

Yeah, to support this in future. Previously if an admin used a different repositoryPathPattern the assumption in go-langserver would fail. I am suggesting we can now always send host+path as an init option, which can then be used for root package path detection.

Note: Before things like the Raw API would of worked. It would of failed though to fetch dependencies via the Raw API since the host+path would of failed against sourcegraph. With this PR, it would work though!

@chrismwendt
Copy link
Contributor

There are cases where extension relies on the name to be the default name. For
example the go language server uses the name as the default root go package
path. To fix this use-case we will need to extend the extension to pass on the
URI field of the repo.

The Go extension or go-langserver aren't broken because of this, are they?

Less broken than before. We have always supported repositoryPathPattern, but if an admin used it the extensions would often break. Now they will work in more cases.

Sweet ✨

Currently, both the Go extension and go-langserver assume the Sourcegraph repository name is the same as the host+path of the clone URL (and go-langserver gets file contents from the Sourcegraph raw API). It doesn't seem like that use-case is supported. When you say "To fix this", does that mean "in order to support this in the future"?

Yeah, to support this in future. Previously if an admin used a different repositoryPathPattern the assumption in go-langserver would fail. I am suggesting we can now always send host+path as an init option, which can then be used for root package path detection.

Currently, these are the InitializeParams sent from the Go extension to go-langserver:

  • originalRootUri: git://github.com/gorilla/mux?c5c6c98bc25355028a63748a498942a6398ccd22
  • rootUri: file:///

Are you suggesting changing that? Coincidentally, I'm changing this in order to make go-langserver more LSP-conformant in sourcegraph/go-langserver#369 (but I'm also trying one other approach, PR pending). Want to chat about this?

Note: Before things like the Raw API would of worked. It would of failed though to fetch dependencies via the Raw API since the host+path would of failed against sourcegraph. With this PR, it would work though!

Sweet again 😄

@keegancsmith
Copy link
Member Author

Currently, both the Go extension and go-langserver assume the Sourcegraph repository name is the same as the host+path of the clone URL (and go-langserver gets file contents from the Sourcegraph raw API). It doesn't seem like that use-case is supported. When you say "To fix this", does that mean "in order to support this in the future"?

Yeah, to support this in future. Previously if an admin used a different repositoryPathPattern the assumption in go-langserver would fail. I am suggesting we can now always send host+path as an init option, which can then be used for root package path detection.

Currently, these are the InitializeParams sent from the Go extension to go-langserver:

  • originalRootUri: git://github.com/gorilla/mux?c5c6c98bc25355028a63748a498942a6398ccd22
  • rootUri: file:///

Are you suggesting changing that? Coincidentally, I'm changing this in order to make go-langserver more LSP-conformant in sourcegraph/go-langserver#369 (but I'm also trying one other approach, PR pending). Want to chat about this?

FYI when we added those initializeParams the initializationOptions field didn't exist (hence us breaking spec). I would think populating initializationOptions with more information would make a lot of sense. What comes to mind is uri (the host+path), commit, ref (master, tag, etc), name (the name on sourcegraph, normally the same as URI), etc. But yes, I was thinking of those fields. My (old) knowledge of the buildserver is we used originalRootUri as part of the root path package detection if we couldn't find something in the repo saying what the root path was.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants