LSIF: Use repository ids instead of names by efritz · Pull Request #7739 · sourcegraph/sourcegraph-public-snapshot

efritz · 2020-01-14T22:36:36Z

Replace usages of repository names in lsif-server with the repository ID known by the frontend. Closes https://github.com/sourcegraph/sourcegraph/issues/6278.

schema changes: these changes will do a join on the repo table so that the repository ids are populated with the id of the repository that has the same name. If a repository id cannot be correlated (due to a rename that has already occurred), those uploads are removed.
- rename lsif_uploads.repository to repository_name_at_upload
- add new field lsif_uploads.repository_id
- add new field lsif_commits.repository_id
- drop lsif_commits.repository
- remove all indexes/constraints from repository and add symmetric ones to repository_id (both tables)
lsif-server api changes:
- accept repository_id and (optionally) repository_name instead of repository for all query endpoints
- accept repository_id and (required, see see Slack thread) repository_name instead of repository for the upload endpoint
lsif-server internal changes: use repository_id everywhere where repository was used before (except for gitserver interfacing, which needs to use the repository name value, if supplied to the API)
lsif-server tests: this is where most of the diff is - but they're all trivial differences
graphql api changes: remove InputRepoName, as we no longer have this data and recent changes to the uploads graphql api has made it an unnecessary datapoint (you must specify a repo in order to list uploads in the first place)
web changes: update usage of the graphql api

keegancsmith

LGTM. Two questions/comments:

Why do you still thread through repo name in many places with the ID?
Original design decision behind lsif was to not be in the same DB as repo. You are now doing joins on it, which will make it very tricky to undo.

On the second point I think this is fine though because the original concern was behind the size of the lsif tables. But I think its just metadata in the table, and the bulk of the data is in lsif-server itself. Is my understanding correct?

efritz · 2020-01-16T14:52:53Z

Original design decision behind lsif was to not be in the same DB as repo. You are now doing joins on it, which will make it very tricky to undo.

I made it a point that lsif-server only touches the lsif-server tables. The repo table is only joined during migrations, which are guaranteed to run on the same machine. There are no such queries at runtime (unless I missed a point somewhere). Not doing this data transition during migrations would cost a manual step with a network call for every unique repo that has LSIF data.

efritz · 2020-01-16T14:54:52Z

Why do you still thread through repo name in many places with the ID?

Because when we interface with gitserver we still need to know the name (both to calculate the shard and to actually perform the request). This can be simplified in a cleanup PR after https://github.com/sourcegraph/sourcegraph/issues/7812.

keegancsmith · 2020-01-16T15:01:25Z

The repo table is only joined during migrations, which are guaranteed to run on the same machine.

Aah of course. Great :D

chrismwendt

Notes accumulated during the review for this change in case they're useful to anyone else later on:

During LSIF upload processing, the LSIF worker must contact a specific gitserver shard that contains the given repository to update the lsif_commits table, visibility, etc.
Which gitserver shard to contact depends on the repo name, so repo ID alone is useless
The problem is how to get the worker to contact the right gitserver
You proposed 3 solutions to this in https://sourcegraph.slack.com/archives/CHXHX7XAS/p1579105999057300?thread_ts=1579105544.057100&cid=CHXHX7XAS
For now, you're going with (1) thread repo name through upload and into the worker, accepting the async rename edge case
At some point either (2) or (3) will be implemented and repo name threading will be removed (probably (3) because it's close to ready to merge https://github.com/sourcegraph/sourcegraph/pull/7828)
- Correction from @efritz: Actually the solution we’ll move to is close to but not exactly 2 or 3, but will depend on a new proxy endpoint in the frontend that translates requests-by-id to requests-by-name for the correct shard.

web/src/enterprise/repo/settings/backend.tsx

Co-Authored-By: Chris Wendt <chrismwendt@gmail.com>

codecov · 2020-01-17T00:44:12Z

Codecov Report

Merging #7739 into master will increase coverage by 0.03%.
The diff coverage is 48.86%.

@@            Coverage Diff             @@
##           master    #7739      +/-   ##
==========================================
+ Coverage   40.54%   40.58%   +0.03%     
==========================================
  Files        1267     1276       +9     
  Lines       66398    66793     +395     
  Branches     6230     6313      +83     
==========================================
+ Hits        26923    27109     +186     
- Misses      37012    37216     +204     
- Partials     2463     2468       +5

Impacted Files	Coverage Δ
cmd/repo-updater/repos/syncer.go	`69.59% <ø> (-0.37%)`	⬇️
cmd/repo-updater/repos/awscodecommit.go	`65.83% <ø> (-0.29%)`	⬇️
web/src/Layout.tsx	`0% <ø> (ø)`	⬆️
...campaigns/detail/changesets/CampaignChangesets.tsx	`100% <ø> (ø)`	⬆️
cmd/repo-updater/repos/bitbucketserver.go	`78.9% <ø> (-0.09%)`	⬇️
shared/src/api/client/types/hover.ts	`95.65% <ø> (ø)`	⬆️
cmd/repo-updater/repos/phabricator.go	`51.51% <ø> (-0.37%)`	⬇️
cmd/frontend/graphqlbackend/a8n.go	`0% <ø> (ø)`	⬆️
web/src/enterprise/repo/settings/backend.tsx	`0% <ø> (ø)`	⬆️
cmd/repo-updater/repos/store.go	`83.19% <ø> (-0.14%)`	⬇️
... and 78 more

eseliger

other than my comment, web changes look 🌟

web/src/enterprise/repo/settings/RepoSettingsLsifUploadPage.tsx

efritz changed the base branch from master to lsif-cleanup-legacy-endpoints January 14, 2020 22:36

efritz changed the base branch from lsif-cleanup-legacy-endpoints to master January 14, 2020 22:37

efritz changed the base branch from master to lsif-cleanup-legacy-endpoints January 14, 2020 23:05

efritz changed the base branch from lsif-cleanup-legacy-endpoints to master January 15, 2020 21:41

Squash commits.

d35f14a

efritz force-pushed the lsif-repository-ids branch from 33902ef to d35f14a Compare January 15, 2020 21:47

Kill fixme.

d012418

efritz changed the title ~~Lsif repository ids~~ LSIF: Use repository ids instead of names Jan 15, 2020

efritz added the lsif label Jan 15, 2020

efritz marked this pull request as ready for review January 15, 2020 23:07

efritz requested a review from emidoots as a code owner January 15, 2020 23:07

efritz requested review from a team January 15, 2020 23:07

keegancsmith approved these changes Jan 16, 2020

View reviewed changes

Merge branch 'master' into lsif-repository-ids

f89eac7

efritz requested a review from chrismwendt January 16, 2020 23:07

chrismwendt approved these changes Jan 17, 2020

View reviewed changes

web/src/enterprise/repo/settings/backend.tsx Outdated Show resolved Hide resolved

Update web/src/enterprise/repo/settings/backend.tsx

f4e3434

Co-Authored-By: Chris Wendt <chrismwendt@gmail.com>

eseliger approved these changes Jan 17, 2020

View reviewed changes

web/src/enterprise/repo/settings/RepoSettingsLsifUploadPage.tsx Outdated Show resolved Hide resolved

Respond to feedback.

a0181d5

efritz merged commit 4e93414 into master Jan 17, 2020

efritz deleted the lsif-repository-ids branch January 17, 2020 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSIF: Use repository ids instead of names#7739

LSIF: Use repository ids instead of names#7739
efritz merged 5 commits intomasterfrom
lsif-repository-ids

efritz commented Jan 14, 2020 •

edited by chrismwendt

Loading

Uh oh!

keegancsmith left a comment

Uh oh!

efritz commented Jan 16, 2020

Uh oh!

efritz commented Jan 16, 2020

Uh oh!

keegancsmith commented Jan 16, 2020

Uh oh!

chrismwendt left a comment

Uh oh!

Uh oh!

codecov bot commented Jan 17, 2020 •

edited

Loading

Uh oh!

eseliger left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

efritz commented Jan 14, 2020 • edited by chrismwendt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keegancsmith left a comment

Choose a reason for hiding this comment

Uh oh!

efritz commented Jan 16, 2020

Uh oh!

efritz commented Jan 16, 2020

Uh oh!

keegancsmith commented Jan 16, 2020

Uh oh!

chrismwendt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jan 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eseliger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

efritz commented Jan 14, 2020 •

edited by chrismwendt

Loading

codecov bot commented Jan 17, 2020 •

edited

Loading