Bridging the gap between "Good First Issues" and "Great First Contributions" via AI-driven mentorship.
IssueSight is a distributed, event-driven platform designed to solve a specific problem in the Open Source ecosystem: Context Switching.
Junior engineers often struggle to contribute not because they can't code, but because they lack the domain context of massive repositories. IssueSight ingests GitHub issues and uses LLMs to generate "Context Bridges" from breaking down complex tickets into junior-level prerequisites, architectural summaries, and implementation guides.
The system follows a vertical Microservices Layering pattern in a monorepo structure. Traffic flows from the Next.js Frontend (Client) through the Go Gateway (Center) down to the Persistence Layer (Bottom).
---
config:
theme: neo-dark
---
flowchart TB
subgraph ClientLayer["1. Client Layer"]
direction TB
UserApp("User")
end
subgraph GatewayLayer["2. Gateway Layer"]
direction TB
APIGateway["API Gateway"]
AuthMgr["Auth & Quota Manager"]
LockMgr["Lock Manager"]
end
subgraph ExternalLayer["5. External Ecosystem"]
GitHub("GitHub API")
LLM("LLM Provider")
end
subgraph LogicLayer["3. Logic & Processing Layer"]
direction TB
Collector["Collector Worker"]
AIWorker["AI Generator Worker"]
end
subgraph DataLayer["4. Data & State Layer"]
direction TB
MongoDB[("MongoDB\nAuth & Quotas")]
Redis[("Redis Speed Layer\nCache/Locks/Stream")]
Postgres[("PostgreSQL\nTutorial Archive")]
end
UserApp -- "1. Submit Issue / Auth" --> APIGateway
APIGateway -.-> AuthMgr & LockMgr
AuthMgr -- "2. Check Limit" --> MongoDB
LockMgr -- "3. Distributed Lock" --> Redis
APIGateway -- "4. Enqueue Task" --> Redis
Collector -- "5. Poll Metadata" --> GitHub
Collector -- "6. Push Context" --> Redis
Redis -- "7. Stream Consume" --> AIWorker
AIWorker -- "8. Generate Content" --> LLM
AIWorker -- "9. Persist Tutorial" --> Postgres
UserApp:::client
APIGateway:::gateway
AuthMgr:::gateway
LockMgr:::gateway
GitHub:::external
LLM:::external
Collector:::worker
AIWorker:::worker
MongoDB:::data
Redis:::data
Postgres:::data
classDef client fill:#fff3e0,stroke:#f57c00,stroke-width:2px,rx:10,ry:10
classDef gateway fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,rx:5,ry:5
classDef worker fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,rx:5,ry:5
classDef data fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,shape:cyl
classDef external fill:#eeeeee,stroke:#999999,stroke-width:2px,stroke-dasharray: 5 5,rx:5,ry:5
style UserApp fill:#00C853,color:#000000
style APIGateway fill:#2962FF
style AuthMgr fill:#2962FF
style LockMgr fill:#2962FF
style GitHub fill:#2962FF
style LLM fill:#00C853
style Collector fill:#2962FF
style AIWorker fill:#FFD600,color:#000000
style MongoDB fill:#FF6D00
style Redis fill:#2962FF
style Postgres fill:#00C853
style GatewayLayer stroke:#00C853,fill:#00C853,color:#000000
style DataLayer fill:#FF6D00,color:#000000
style LogicLayer fill:#00C853,color:#000000
style ExternalLayer fill:#FFD600,color:#000000
style ClientLayer fill:#BBDEFB,color:#000000
linkStyle 0 stroke:#f57c00,stroke-width:2px,fill:none
linkStyle 1 stroke:#2962FF,fill:none
linkStyle 2 stroke:#2962FF,fill:none
linkStyle 3 stroke:#2962FF,fill:none
linkStyle 4 stroke:#2962FF,fill:none
linkStyle 5 stroke:#2962FF,fill:none
linkStyle 6 stroke:#000000,fill:none
linkStyle 7 stroke:#000000,fill:none
linkStyle 8 stroke:#2e7d32,stroke-width:2px,fill:none
linkStyle 9 stroke:#2e7d32,stroke-width:2px,fill:none
linkStyle 10 stroke:#2962FF,fill:none
- Ingestion (The Write Path - Blue Lines): A background
Collectorservice polls GitHub and pushes raw events to a Redis Stream. This ensures that if the GitHub API is slow or rate-limited, it does not block the rest of the application. - Processing (The Worker): The
AI Workerconsumes the stream, utilizingOpenAIto analyze the code complexity. It determines if an issue is truly "Junior Friendly" or if it requires advanced knowledge. - Serving (The Read Path - Orange Lines): The
API Gatewayserves the frontend. It implements a Cache-Aside strategy: popular issues are served from Redis KV memory (<5ms), while the database is only hit on cache misses.
The database schema follows a normalized relational design with PostgreSQL as the primary data store. The ERD below illustrates the core entities and their relationships:
erDiagram
USERS ||--o{ USER_IDENTITIES : "authenticates_via"
USERS ||--o{ TUTORIALS : "unlocks"
PROJECTS ||--o{ GITHUB_ISSUES : "contains"
PROJECTS ||--o{ PROJECT_CONCEPTS : "categorized_by"
GITHUB_ISSUES ||--o| TUTORIAL_CONTENTS : "generates"
CONCEPTS ||--o{ PROJECT_CONCEPTS : "defines"
CONCEPTS ||--o{ TUTORIAL_CONCEPTS : "tags"
CONCEPTS ||--o{ CONCEPT_RELATIONSHIPS : "is_parent_of"
CONCEPTS ||--o{ CONCEPT_RELATIONSHIPS : "is_child_of"
TUTORIAL_CONTENTS ||--o{ TUTORIALS : "serves"
TUTORIAL_CONTENTS ||--o{ TUTORIAL_CONCEPTS : "explains"
USERS {
uuid id PK
string email UK
string display_name
string avatar_url
timestamp last_requested_at "Quota_Anchor"
timestamp created_at
}
USER_IDENTITIES {
uuid id PK
uuid user_id FK
string provider "github_or_google"
string provider_id UK "External_ID"
}
PROJECTS {
uuid id PK
bigint gh_repo_id UK
string owner_handle
string repo_name
string full_name UK
string language
timestamp created_at
}
GITHUB_ISSUES {
uuid id PK
uuid project_id FK
int issue_number
bigint gh_issue_id UK
jsonb raw_data "Cached_GitHub_JSON"
timestamp last_synced_at
}
TUTORIAL_CONTENTS {
uuid id PK
uuid issue_id FK "Unique_per_Issue"
string title
text markdown_body "The_AI_Output"
string status "PENDING_COMPLETED_FAILED"
timestamp created_at
timestamp updated_at
}
TUTORIALS {
uuid id PK
uuid user_id FK
uuid content_id FK
boolean is_original_requester
timestamp created_at
}
CONCEPTS {
uuid id PK
string slug UK "e-g-message-queues"
string name
text description
}
CONCEPT_RELATIONSHIPS {
uuid parent_id FK
uuid child_id FK
string rel_type "subconcept_of"
}
PROJECT_CONCEPTS {
uuid project_id FK
uuid concept_id FK
}
TUTORIAL_CONCEPTS {
uuid content_id FK
uuid concept_id FK
}
- PROJECTS: GitHub repositories tracked by IssueSight, storing repository metadata (owner, name, language) with unique GitHub repository ID
- GITHUB_ISSUES: Issues fetched from GitHub, linked to projects with raw JSONB data (
raw_data) containing body, comments, and labels for flexibility - TUTORIAL_CONTENTS: AI-generated context bridges (one per issue via unique
issue_idconstraint), stored as markdown with status tracking (PENDING, COMPLETED, FAILED) - USERS: User accounts with quota management via
last_requested_attimestamp for rate limiting - USER_IDENTITIES: OAuth provider mappings (GitHub, Google) linking external provider IDs to user accounts for multi-provider authentication
- TUTORIALS: Junction table tracking which users have unlocked which tutorial contents, with
is_original_requesterflag - CONCEPTS: Reusable concept definitions (e.g., "message-queues") identified by unique slugs, used for tagging and categorization
- CONCEPT_RELATIONSHIPS: Self-referential table enabling hierarchical concept relationships (parent-child) with relationship types like "subconcept_of"
- PROJECT_CONCEPTS: Junction table linking projects to concepts for project categorization
- TUTORIAL_CONCEPTS: Junction table linking tutorial contents to concepts for content tagging
-
One-to-Many:
USERS→USER_IDENTITIES(users can authenticate via multiple providers)USERS→TUTORIALS(users can unlock multiple tutorials)PROJECTS→GITHUB_ISSUES(projects contain multiple issues)TUTORIAL_CONTENTS→TUTORIALS(one tutorial content can serve multiple users)CONCEPTS→PROJECT_CONCEPTS(concepts can tag multiple projects)CONCEPTS→TUTORIAL_CONCEPTS(concepts can tag multiple tutorials)CONCEPTS→CONCEPT_RELATIONSHIPS(concepts can have parent/child relationships)
-
One-to-One:
GITHUB_ISSUES→TUTORIAL_CONTENTS(uniqueissue_idconstraint ensures one tutorial per issue)
-
Many-to-Many:
PROJECTS↔CONCEPTS(viaPROJECT_CONCEPTSjunction table)TUTORIAL_CONTENTS↔CONCEPTS(viaTUTORIAL_CONCEPTSjunction table)CONCEPTS↔CONCEPTS(viaCONCEPT_RELATIONSHIPSfor hierarchical relationships)
This design enables efficient querying, supports concept-based discovery and hierarchical concept organization, maintains data integrity through proper constraints, and allows flexible JSONB storage for volatile GitHub API responses while tracking user access and quota limits.
I chose Redis Streams over a simple cron job to decouple the fetching logic from the processing logic. This allows the system to scale independently—if issue volume spikes, I can simply spin up more AI Worker replicas without changing the Collector code.
GitHub's API response is large and volatile. Instead of strictly normalizing every field, I utilize a Hybrid Schema:
- Structured Columns:
id,status,difficulty(Indexed for fast lookups/filtering). - JSONB:
raw_github_payload(Stored as-is for future flexibility without schema migrations).
Go was selected for its native concurrency primitives (goroutines), which are essential for handling multiple HTTP requests and background stream processing with minimal memory footprint compared to Node.js or Python.
| Component | Technology | Reasoning |
|---|---|---|
| Frontend | Next.js 14 (TypeScript, App Router) | Modern React framework with server-side rendering. |
| Backend | Golang (Gin/Standard Lib) | Strong typing, high performance, native concurrency. |
| Database | PostgreSQL 16 | ACID compliance with JSONB support. |
| Message Broker | Redis Streams | Lightweight, low-latency event buffering. |
| Caching | Redis KV | High-speed read access for API endpoints. |
| AI Layer | OpenAI GPT-5 | Context analysis and prerequisite generation. |
| Infrastructure | Docker Compose | Reproducible local development environment. |
Default model is configured via LLM_MODEL=gpt-5. If needed, roll back with LLM_MODEL=gpt-4o.
issuesight/
├── web/ # Next.js Frontend Service
├── backend/ # Go Microservices
│ ├── gateway/ # API Gateway
│ ├── collector/ # GitHub Issue Collector
│ └── ai-processor/ # AI Content Generator
├── internal/ # Shared Go Packages
│ ├── platform/ # Platform utilities (db, stream, lock)
│ └── domain/ # Shared domain types
└── deployments/ # Docker Compose & Environment Configs

