Skip to content

Eoic/RepoRadar

Repository files navigation

RepoRadar

Discover similar GitHub repositories using dual-vector semantic search. Submit a repo URL and get ranked results based on purpose (what it does) and tech stack (what it's built with).

How it works

Each repository is represented by two 384-dimensional embeddings generated by BAAI/bge-small-en-v1.5:

  • Purpose vector — derived from the repo's description, topics, and README
  • Stack vector — derived from primary language, language distribution, and parsed dependency manifests (supports 9 formats: requirements.txt, pyproject.toml, package.json, Cargo.toml, pubspec.yaml, go.mod, Gemfile, pom.xml, build.gradle)

Search performs two parallel queries against Qdrant, then merges and re-ranks using a weighted sum (default 70% purpose / 30% stack). Users can adjust the weights via the frontend slider.

Architecture

Frontend (React + Vite + TypeScript) -> REST API -> Backend (FastAPI / Python 3.11+)
  ├── GitHub API Client (httpx)
  ├── Text Preprocessor (README cleaning, dependency extraction)
  ├── Embedding Service (sentence-transformers, 384d)
  └── Vector Store (Qdrant, named vectors: "purpose" + "stack")

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Docker (for Qdrant)
  • A GitHub personal access token

Setup

1. Clone and install

git clone <repo-url> && cd RepoRadar

# Backend
pip install -e ".[dev]"

# Frontend
cd frontend && npm install && cd ..

2. Configure environment

cp .env.example .env

Edit .env and set at minimum:

GITHUB_PAT=ghp_your_personal_access_token
SESSION_SECRET=some-random-string

For GitHub OAuth (optional), also set GITHUB_CLIENT_ID and GITHUB_CLIENT_SECRET.

3. Start Qdrant

docker compose up -d qdrant

Qdrant will be available at localhost:6333.

4. Seed the database

# Quick test with 20 repos
python scripts/seed_initial.py --limit 20

# Preview without indexing
python scripts/seed_initial.py --dry-run

# Full seed (200+ topics)
python scripts/seed_initial.py

5. Start the backend

uvicorn app.main:app --reload

The API will be at http://localhost:8000. Verify with:

curl http://localhost:8000/api/health

6. Start the frontend

cd frontend && npm run dev

Opens at http://localhost:5173.

API endpoints

Method Path Description
GET /api/health Health check with indexed repo count
POST /api/search Search for similar repos
POST /api/index Manually index a repo
GET /api/auth/github Start GitHub OAuth flow
GET /api/auth/callback OAuth callback
GET /api/user/repos List authenticated user's repos

Search example

curl -X POST http://localhost:8000/api/search \
  -H 'Content-Type: application/json' \
  -d '{
    "repo_url": "pallets/flask",
    "weight_purpose": 0.7,
    "weight_stack": 0.3,
    "limit": 10,
    "min_stars": 50
  }'

Maintenance

Re-index stale repos (older than 7 days by default):

python scripts/update_stale.py
python scripts/update_stale.py --stale-days 14
python scripts/update_stale.py --dry-run

Development

# Run tests
pytest

# Run tests excluding slow embedding tests
pytest --ignore=tests/test_embedder.py

# Lint and format
ruff check .
ruff format .

Deployment

Docker Compose (full stack locally)

Run the backend and Qdrant together:

docker compose up -d

This starts:

  • Qdrant on localhost:6333 (data persisted in a Docker volume)
  • Backend on localhost:8000 (hot-reloads via volume mount)

The backend waits for Qdrant's health check before starting. Seed data after both services are up:

docker compose exec backend python scripts/seed_initial.py --limit 20

Render (backend)

The repo includes a render.yaml Blueprint. To deploy:

  1. Push the repo to GitHub.
  2. In the Render Dashboard, create a New Blueprint Instance and connect the repo.
  3. Set the required environment variables when prompted:
    • GITHUB_PAT — GitHub personal access token
    • GITHUB_CLIENT_ID / GITHUB_CLIENT_SECRET — GitHub OAuth app credentials
    • QDRANT_URL — full URL of your Qdrant instance (e.g. https://xyz.aws.cloud.qdrant.io:6333)
    • QDRANT_API_KEY — Qdrant Cloud API key
  4. SESSION_SECRET is auto-generated. CORS_ORIGINS is pre-configured for GitHub Pages and localhost.

The service runs on the free tier. The health check path is /api/health.

Qdrant Cloud

For production, use Qdrant Cloud instead of a self-hosted instance:

  1. Create a free-tier cluster.
  2. Copy the cluster URL and API key.
  3. Set QDRANT_URL and QDRANT_API_KEY in your Render environment (or .env for local use).

When QDRANT_URL is set, the app uses it instead of QDRANT_HOST/QDRANT_PORT.

GitHub Pages (frontend)

The frontend deploys automatically via the deploy-frontend.yml workflow on pushes to master that change files under frontend/.

To configure:

  1. In your GitHub repo, go to Settings > Pages and set the source to GitHub Actions.
  2. The workflow builds with VITE_API_URL pointing to your Render backend URL.
  3. After a push, the site is live at https://<username>.github.io/<repo>/.

To change the backend URL, edit the VITE_API_URL value in .github/workflows/deploy-frontend.yml.

CI pipeline

Every push to master and every pull request runs the CI workflow (.github/workflows/ci.yml):

  • Backend: ruff check + ruff format --check + pytest (excluding slow embedding tests)
  • Frontend: eslint + vitest + tsc type check

Tech stack

  • Backend: FastAPI, httpx, sentence-transformers, Qdrant
  • Frontend: React, Vite, TypeScript
  • Vector DB: Qdrant (dual named vectors, cosine distance)
  • Embedding model: BAAI/bge-small-en-v1.5 (384 dimensions)

About

Repository similarity search.

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •