Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

LSIF: Use repository ids instead of names#7739

Merged
efritz merged 5 commits intomasterfrom
lsif-repository-ids
Jan 17, 2020
Merged

LSIF: Use repository ids instead of names#7739
efritz merged 5 commits intomasterfrom
lsif-repository-ids

Conversation

@efritz
Copy link
Contributor

@efritz efritz commented Jan 14, 2020

Replace usages of repository names in lsif-server with the repository ID known by the frontend. Closes https://github.com/sourcegraph/sourcegraph/issues/6278.

  • schema changes: these changes will do a join on the repo table so that the repository ids are populated with the id of the repository that has the same name. If a repository id cannot be correlated (due to a rename that has already occurred), those uploads are removed.
    • rename lsif_uploads.repository to repository_name_at_upload
    • add new field lsif_uploads.repository_id
    • add new field lsif_commits.repository_id
    • drop lsif_commits.repository
    • remove all indexes/constraints from repository and add symmetric ones to repository_id (both tables)
  • lsif-server api changes:
    • accept repository_id and (optionally) repository_name instead of repository for all query endpoints
    • accept repository_id and (required, see see Slack thread) repository_name instead of repository for the upload endpoint
  • lsif-server internal changes: use repository_id everywhere where repository was used before (except for gitserver interfacing, which needs to use the repository name value, if supplied to the API)
  • lsif-server tests: this is where most of the diff is - but they're all trivial differences
  • graphql api changes: remove InputRepoName, as we no longer have this data and recent changes to the uploads graphql api has made it an unnecessary datapoint (you must specify a repo in order to list uploads in the first place)
  • web changes: update usage of the graphql api

@efritz efritz changed the base branch from master to lsif-cleanup-legacy-endpoints January 14, 2020 22:36
@efritz efritz changed the base branch from lsif-cleanup-legacy-endpoints to master January 14, 2020 22:37
@efritz efritz changed the base branch from master to lsif-cleanup-legacy-endpoints January 14, 2020 23:05
@efritz efritz changed the base branch from lsif-cleanup-legacy-endpoints to master January 15, 2020 21:41
@efritz efritz force-pushed the lsif-repository-ids branch from 33902ef to d35f14a Compare January 15, 2020 21:47
@efritz efritz changed the title Lsif repository ids LSIF: Use repository ids instead of names Jan 15, 2020
@efritz efritz added the lsif label Jan 15, 2020
@efritz efritz marked this pull request as ready for review January 15, 2020 23:07
@efritz efritz requested a review from emidoots as a code owner January 15, 2020 23:07
@efritz efritz requested review from a team January 15, 2020 23:07
Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Two questions/comments:

  • Why do you still thread through repo name in many places with the ID?
  • Original design decision behind lsif was to not be in the same DB as repo. You are now doing joins on it, which will make it very tricky to undo.

On the second point I think this is fine though because the original concern was behind the size of the lsif tables. But I think its just metadata in the table, and the bulk of the data is in lsif-server itself. Is my understanding correct?

@efritz
Copy link
Contributor Author

efritz commented Jan 16, 2020

Original design decision behind lsif was to not be in the same DB as repo. You are now doing joins on it, which will make it very tricky to undo.

I made it a point that lsif-server only touches the lsif-server tables. The repo table is only joined during migrations, which are guaranteed to run on the same machine. There are no such queries at runtime (unless I missed a point somewhere). Not doing this data transition during migrations would cost a manual step with a network call for every unique repo that has LSIF data.

@efritz
Copy link
Contributor Author

efritz commented Jan 16, 2020

Why do you still thread through repo name in many places with the ID?

Because when we interface with gitserver we still need to know the name (both to calculate the shard and to actually perform the request). This can be simplified in a cleanup PR after https://github.com/sourcegraph/sourcegraph/issues/7812.

@keegancsmith
Copy link
Member

The repo table is only joined during migrations, which are guaranteed to run on the same machine.

Aah of course. Great :D

@efritz efritz requested a review from chrismwendt January 16, 2020 23:07
Copy link
Contributor

@chrismwendt chrismwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes accumulated during the review for this change in case they're useful to anyone else later on:

  • During LSIF upload processing, the LSIF worker must contact a specific gitserver shard that contains the given repository to update the lsif_commits table, visibility, etc.
  • Which gitserver shard to contact depends on the repo name, so repo ID alone is useless
  • The problem is how to get the worker to contact the right gitserver
  • You proposed 3 solutions to this in https://sourcegraph.slack.com/archives/CHXHX7XAS/p1579105999057300?thread_ts=1579105544.057100&cid=CHXHX7XAS
  • For now, you're going with (1) thread repo name through upload and into the worker, accepting the async rename edge case
  • At some point either (2) or (3) will be implemented and repo name threading will be removed (probably (3) because it's close to ready to merge https://github.com/sourcegraph/sourcegraph/pull/7828)
    • Correction from @efritz: Actually the solution we’ll move to is close to but not exactly 2 or 3, but will depend on a new proxy endpoint in the frontend that translates requests-by-id to requests-by-name for the correct shard.

Co-Authored-By: Chris Wendt <chrismwendt@gmail.com>
@codecov
Copy link

codecov bot commented Jan 17, 2020

Codecov Report

Merging #7739 into master will increase coverage by 0.03%.
The diff coverage is 48.86%.

@@            Coverage Diff             @@
##           master    #7739      +/-   ##
==========================================
+ Coverage   40.54%   40.58%   +0.03%     
==========================================
  Files        1267     1276       +9     
  Lines       66398    66793     +395     
  Branches     6230     6313      +83     
==========================================
+ Hits        26923    27109     +186     
- Misses      37012    37216     +204     
- Partials     2463     2468       +5
Impacted Files Coverage Δ
cmd/repo-updater/repos/syncer.go 69.59% <ø> (-0.37%) ⬇️
cmd/repo-updater/repos/awscodecommit.go 65.83% <ø> (-0.29%) ⬇️
web/src/Layout.tsx 0% <ø> (ø) ⬆️
...campaigns/detail/changesets/CampaignChangesets.tsx 100% <ø> (ø) ⬆️
cmd/repo-updater/repos/bitbucketserver.go 78.9% <ø> (-0.09%) ⬇️
shared/src/api/client/types/hover.ts 95.65% <ø> (ø) ⬆️
cmd/repo-updater/repos/phabricator.go 51.51% <ø> (-0.37%) ⬇️
cmd/frontend/graphqlbackend/a8n.go 0% <ø> (ø) ⬆️
web/src/enterprise/repo/settings/backend.tsx 0% <ø> (ø) ⬆️
cmd/repo-updater/repos/store.go 83.19% <ø> (-0.14%) ⬇️
... and 78 more

Copy link
Member

@eseliger eseliger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than my comment, web changes look 🌟

@efritz efritz merged commit 4e93414 into master Jan 17, 2020
@efritz efritz deleted the lsif-repository-ids branch January 17, 2020 14:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LSIF server repository names

4 participants

Comments