LSIF: Use repository ids instead of names#7739
Conversation
33902ef to
d35f14a
Compare
keegancsmith
left a comment
There was a problem hiding this comment.
LGTM. Two questions/comments:
- Why do you still thread through repo name in many places with the ID?
- Original design decision behind lsif was to not be in the same DB as repo. You are now doing joins on it, which will make it very tricky to undo.
On the second point I think this is fine though because the original concern was behind the size of the lsif tables. But I think its just metadata in the table, and the bulk of the data is in lsif-server itself. Is my understanding correct?
I made it a point that lsif-server only touches the lsif-server tables. The repo table is only joined during migrations, which are guaranteed to run on the same machine. There are no such queries at runtime (unless I missed a point somewhere). Not doing this data transition during migrations would cost a manual step with a network call for every unique repo that has LSIF data. |
Because when we interface with gitserver we still need to know the name (both to calculate the shard and to actually perform the request). This can be simplified in a cleanup PR after https://github.com/sourcegraph/sourcegraph/issues/7812. |
Aah of course. Great :D |
chrismwendt
left a comment
There was a problem hiding this comment.
Notes accumulated during the review for this change in case they're useful to anyone else later on:
- During LSIF upload processing, the LSIF worker must contact a specific gitserver shard that contains the given repository to update the lsif_commits table, visibility, etc.
- Which gitserver shard to contact depends on the repo name, so repo ID alone is useless
- The problem is how to get the worker to contact the right gitserver
- You proposed 3 solutions to this in https://sourcegraph.slack.com/archives/CHXHX7XAS/p1579105999057300?thread_ts=1579105544.057100&cid=CHXHX7XAS
- For now, you're going with (1) thread repo name through upload and into the worker, accepting the async rename edge case
- At some point either (2) or (3) will be implemented and repo name threading will be removed (probably (3) because it's close to ready to merge https://github.com/sourcegraph/sourcegraph/pull/7828)
- Correction from @efritz: Actually the solution we’ll move to is close to but not exactly 2 or 3, but will depend on a new proxy endpoint in the frontend that translates requests-by-id to requests-by-name for the correct shard.
Co-Authored-By: Chris Wendt <chrismwendt@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #7739 +/- ##
==========================================
+ Coverage 40.54% 40.58% +0.03%
==========================================
Files 1267 1276 +9
Lines 66398 66793 +395
Branches 6230 6313 +83
==========================================
+ Hits 26923 27109 +186
- Misses 37012 37216 +204
- Partials 2463 2468 +5
|
eseliger
left a comment
There was a problem hiding this comment.
other than my comment, web changes look 🌟
Replace usages of repository names in lsif-server with the repository ID known by the frontend. Closes https://github.com/sourcegraph/sourcegraph/issues/6278.
repotable so that the repository ids are populated with the id of the repository that has the same name. If a repository id cannot be correlated (due to a rename that has already occurred), those uploads are removed.