Skip to content

perf(github): Cache accessible repos for accessibleOnly search#112548

Draft
jaydgoss wants to merge 14 commits intomasterfrom
jaygoss/vdy-68-perfgithub-optimize-accessibleonly-repo-search-to-avoid
Draft

perf(github): Cache accessible repos for accessibleOnly search#112548
jaydgoss wants to merge 14 commits intomasterfrom
jaygoss/vdy-68-perfgithub-optimize-accessibleonly-repo-search-to-avoid

Conversation

@jaydgoss
Copy link
Copy Markdown
Member

@jaydgoss jaydgoss commented Apr 8, 2026

Summary

  • The OrganizationIntegrationReposEndpoint (/integrations/{id}/repos/) lets the frontend search for GitHub repos available to a GitHub App installation. When called with accessibleOnly=true and a search query (as the SCM onboarding repo selector does on each debounced keystroke), the previous implementation fetched all installation-accessible repos from the GitHub API (up to 50 pages of 100 = 5,000 repos) on every request, then filtered with a Python list comprehension
  • Cache the full repo list in sentry.cache.default_cache (Redis) for 5 minutes, and filter locally on subsequent requests — reducing each typed query from O(pages) GitHub API calls to zero

Test plan

  • Existing get_repositories tests pass (6/6)
  • New test_get_repositories_accessible_only_caches_repos verifies cache hit path skips /installation/repositories calls
  • Manual testing: second accessibleOnly search returns instantly from cache

Refs VDY-68

jaydgoss added 2 commits April 8, 2026 16:59
When accessibleOnly=true with a search query, the old path fetched all
installation repos (up to 5,000) on every debounced keystroke, then
filtered with a Python list comprehension. Replace this with a cached
set of accessible repo IDs (5-min Redis TTL) combined with the GitHub
Search API, reducing each typed query from O(pages) API calls to a
single search call plus a Redis lookup.

Refs VDY-68
…h API

Switch from Search API + cached ID set to caching the full repo list
and filtering locally. This avoids the Search API's shared 30 req/min
rate limit and uses sentry.cache.default_cache (Redis-backed) instead
of django.core.cache (DummyCache in Sentry).

Refs VDY-68
@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 8, 2026

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 8, 2026
jaydgoss added 2 commits April 8, 2026 17:57
Keep the cached repo list unfiltered so the cache is a faithful
snapshot of the GitHub API response. Apply the archived filter in
get_repositories alongside the other transforms. Also let the
accessible_only path handle both with and without a query.

Refs VDY-68
The Search API does not return archived repos, so the archived filter
should only apply to the /installation/repositories paths.
Move no-query path first since accessible_only is only useful with a
query (repeated keystrokes). Combine archived and query filters into
a single pass through to_repo_info.
Strip raw GitHub repo dicts down to the 5 fields used by
get_repositories before storing in the cache. Reduces per-integration
cache size from ~3KB per repo to ~100 bytes.
getsentry configures CACHES with memcached in production, so
django.core.cache.cache works and matches the pattern used by the
rest of the integrations codebase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant