Discover similar GitHub repositories using dual-vector semantic search. Submit a repo URL and get ranked results based on purpose (what it does) and tech stack (what it's built with).
Each repository is represented by two 384-dimensional embeddings generated by BAAI/bge-small-en-v1.5:
- Purpose vector — derived from the repo's description, topics, and README
- Stack vector — derived from primary language, language distribution, and parsed dependency manifests (supports 9 formats: requirements.txt, pyproject.toml, package.json, Cargo.toml, pubspec.yaml, go.mod, Gemfile, pom.xml, build.gradle)
Search performs two parallel queries against Qdrant, then merges and re-ranks using a weighted sum (default 70% purpose / 30% stack). Users can adjust the weights via the frontend slider.
Frontend (React + Vite + TypeScript) -> REST API -> Backend (FastAPI / Python 3.11+)
├── GitHub API Client (httpx)
├── Text Preprocessor (README cleaning, dependency extraction)
├── Embedding Service (sentence-transformers, 384d)
└── Vector Store (Qdrant, named vectors: "purpose" + "stack")
- Python 3.11+
- Node.js 18+
- Docker (for Qdrant)
- A GitHub personal access token
git clone <repo-url> && cd RepoRadar
# Backend
pip install -e ".[dev]"
# Frontend
cd frontend && npm install && cd ..cp .env.example .envEdit .env and set at minimum:
GITHUB_PAT=ghp_your_personal_access_token
SESSION_SECRET=some-random-stringFor GitHub OAuth (optional), also set GITHUB_CLIENT_ID and GITHUB_CLIENT_SECRET.
docker compose up -d qdrantQdrant will be available at localhost:6333.
# Quick test with 20 repos
python scripts/seed_initial.py --limit 20
# Preview without indexing
python scripts/seed_initial.py --dry-run
# Full seed (200+ topics)
python scripts/seed_initial.pyuvicorn app.main:app --reloadThe API will be at http://localhost:8000. Verify with:
curl http://localhost:8000/api/healthcd frontend && npm run devOpens at http://localhost:5173.
| Method | Path | Description |
|---|---|---|
GET |
/api/health |
Health check with indexed repo count |
POST |
/api/search |
Search for similar repos |
POST |
/api/index |
Manually index a repo |
GET |
/api/auth/github |
Start GitHub OAuth flow |
GET |
/api/auth/callback |
OAuth callback |
GET |
/api/user/repos |
List authenticated user's repos |
curl -X POST http://localhost:8000/api/search \
-H 'Content-Type: application/json' \
-d '{
"repo_url": "pallets/flask",
"weight_purpose": 0.7,
"weight_stack": 0.3,
"limit": 10,
"min_stars": 50
}'Re-index stale repos (older than 7 days by default):
python scripts/update_stale.py
python scripts/update_stale.py --stale-days 14
python scripts/update_stale.py --dry-run# Run tests
pytest
# Run tests excluding slow embedding tests
pytest --ignore=tests/test_embedder.py
# Lint and format
ruff check .
ruff format .Run the backend and Qdrant together:
docker compose up -dThis starts:
- Qdrant on
localhost:6333(data persisted in a Docker volume) - Backend on
localhost:8000(hot-reloads via volume mount)
The backend waits for Qdrant's health check before starting. Seed data after both services are up:
docker compose exec backend python scripts/seed_initial.py --limit 20The repo includes a render.yaml Blueprint. To deploy:
- Push the repo to GitHub.
- In the Render Dashboard, create a New Blueprint Instance and connect the repo.
- Set the required environment variables when prompted:
GITHUB_PAT— GitHub personal access tokenGITHUB_CLIENT_ID/GITHUB_CLIENT_SECRET— GitHub OAuth app credentialsQDRANT_URL— full URL of your Qdrant instance (e.g.https://xyz.aws.cloud.qdrant.io:6333)QDRANT_API_KEY— Qdrant Cloud API key
SESSION_SECRETis auto-generated.CORS_ORIGINSis pre-configured for GitHub Pages and localhost.
The service runs on the free tier. The health check path is /api/health.
For production, use Qdrant Cloud instead of a self-hosted instance:
- Create a free-tier cluster.
- Copy the cluster URL and API key.
- Set
QDRANT_URLandQDRANT_API_KEYin your Render environment (or.envfor local use).
When QDRANT_URL is set, the app uses it instead of QDRANT_HOST/QDRANT_PORT.
The frontend deploys automatically via the deploy-frontend.yml workflow on pushes to master that change files under frontend/.
To configure:
- In your GitHub repo, go to Settings > Pages and set the source to GitHub Actions.
- The workflow builds with
VITE_API_URLpointing to your Render backend URL. - After a push, the site is live at
https://<username>.github.io/<repo>/.
To change the backend URL, edit the VITE_API_URL value in .github/workflows/deploy-frontend.yml.
Every push to master and every pull request runs the CI workflow (.github/workflows/ci.yml):
- Backend:
ruff check+ruff format --check+pytest(excluding slow embedding tests) - Frontend:
eslint+vitest+tsctype check
- Backend: FastAPI, httpx, sentence-transformers, Qdrant
- Frontend: React, Vite, TypeScript
- Vector DB: Qdrant (dual named vectors, cosine distance)
- Embedding model: BAAI/bge-small-en-v1.5 (384 dimensions)