IssueSight 🔭

Bridging the gap between "Good First Issues" and "Great First Contributions" via AI-driven mentorship.

The Engineering Goal

IssueSight is a distributed, event-driven platform designed to solve a specific problem in the Open Source ecosystem: Context Switching.

Junior engineers often struggle to contribute not because they can't code, but because they lack the domain context of massive repositories. IssueSight ingests GitHub issues and uses LLMs to generate "Context Bridges" from breaking down complex tickets into junior-level prerequisites, architectural summaries, and implementation guides.

System Architecture

The system follows a vertical Microservices Layering pattern in a monorepo structure. Traffic flows from the Next.js Frontend (Client) through the Go Gateway (Center) down to the Persistence Layer (Bottom).

---
config:
  theme: neo-dark
---
flowchart TB
 subgraph ClientLayer["1. Client Layer"]
    direction TB
        UserApp("User")
  end
 subgraph GatewayLayer["2. Gateway Layer"]
    direction TB
        APIGateway["API Gateway"]
        AuthMgr["Auth & Quota Manager"]
        LockMgr["Lock Manager"]
  end
 subgraph ExternalLayer["5. External Ecosystem"]
        GitHub("GitHub API")
        LLM("LLM Provider")
  end
 subgraph LogicLayer["3. Logic & Processing Layer"]
    direction TB
        Collector["Collector Worker"]
        AIWorker["AI Generator Worker"]
  end
 subgraph DataLayer["4. Data & State Layer"]
    direction TB
        MongoDB[("MongoDB\nAuth & Quotas")]
        Redis[("Redis Speed Layer\nCache/Locks/Stream")]
        Postgres[("PostgreSQL\nTutorial Archive")]
  end
    UserApp -- "1. Submit Issue / Auth" --> APIGateway
    APIGateway -.-> AuthMgr & LockMgr
    AuthMgr -- "2. Check Limit" --> MongoDB
    LockMgr -- "3. Distributed Lock" --> Redis
    APIGateway -- "4. Enqueue Task" --> Redis
    Collector -- "5. Poll Metadata" --> GitHub
    Collector -- "6. Push Context" --> Redis
    Redis -- "7. Stream Consume" --> AIWorker
    AIWorker -- "8. Generate Content" --> LLM
    AIWorker -- "9. Persist Tutorial" --> Postgres

     UserApp:::client
     APIGateway:::gateway
     AuthMgr:::gateway
     LockMgr:::gateway
     GitHub:::external
     LLM:::external
     Collector:::worker
     AIWorker:::worker
     MongoDB:::data
     Redis:::data
     Postgres:::data
    classDef client fill:#fff3e0,stroke:#f57c00,stroke-width:2px,rx:10,ry:10
    classDef gateway fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,rx:5,ry:5
    classDef worker fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,rx:5,ry:5
    classDef data fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,shape:cyl
    classDef external fill:#eeeeee,stroke:#999999,stroke-width:2px,stroke-dasharray: 5 5,rx:5,ry:5
    style UserApp fill:#00C853,color:#000000
    style APIGateway fill:#2962FF
    style AuthMgr fill:#2962FF
    style LockMgr fill:#2962FF
    style GitHub fill:#2962FF
    style LLM fill:#00C853
    style Collector fill:#2962FF
    style AIWorker fill:#FFD600,color:#000000
    style MongoDB fill:#FF6D00
    style Redis fill:#2962FF
    style Postgres fill:#00C853
    style GatewayLayer stroke:#00C853,fill:#00C853,color:#000000
    style DataLayer fill:#FF6D00,color:#000000
    style LogicLayer fill:#00C853,color:#000000
    style ExternalLayer fill:#FFD600,color:#000000
    style ClientLayer fill:#BBDEFB,color:#000000
    linkStyle 0 stroke:#f57c00,stroke-width:2px,fill:none
    linkStyle 1 stroke:#2962FF,fill:none
    linkStyle 2 stroke:#2962FF,fill:none
    linkStyle 3 stroke:#2962FF,fill:none
    linkStyle 4 stroke:#2962FF,fill:none
    linkStyle 5 stroke:#2962FF,fill:none
    linkStyle 6 stroke:#000000,fill:none
    linkStyle 7 stroke:#000000,fill:none
    linkStyle 8 stroke:#2e7d32,stroke-width:2px,fill:none
    linkStyle 9 stroke:#2e7d32,stroke-width:2px,fill:none
    linkStyle 10 stroke:#2962FF,fill:none

Data Flow Breakdown

Ingestion (The Write Path - Blue Lines): A background Collector service polls GitHub and pushes raw events to a Redis Stream. This ensures that if the GitHub API is slow or rate-limited, it does not block the rest of the application.
Processing (The Worker): The AI Worker consumes the stream, utilizing OpenAI to analyze the code complexity. It determines if an issue is truly "Junior Friendly" or if it requires advanced knowledge.
Serving (The Read Path - Orange Lines): The API Gateway serves the frontend. It implements a Cache-Aside strategy: popular issues are served from Redis KV memory (<5ms), while the database is only hit on cache misses.

Key Technical Decisions

Data Model

The database schema follows a normalized relational design with PostgreSQL as the primary data store. The ERD below illustrates the core entities and their relationships:

erDiagram
    USERS ||--o{ USER_IDENTITIES : "authenticates_via"
    USERS ||--o{ TUTORIALS : "unlocks"
    
    PROJECTS ||--o{ GITHUB_ISSUES : "contains"
    PROJECTS ||--o{ PROJECT_CONCEPTS : "categorized_by"
    
    GITHUB_ISSUES ||--o| TUTORIAL_CONTENTS : "generates"
    
    CONCEPTS ||--o{ PROJECT_CONCEPTS : "defines"
    CONCEPTS ||--o{ TUTORIAL_CONCEPTS : "tags"
    CONCEPTS ||--o{ CONCEPT_RELATIONSHIPS : "is_parent_of"
    CONCEPTS ||--o{ CONCEPT_RELATIONSHIPS : "is_child_of"
    
    TUTORIAL_CONTENTS ||--o{ TUTORIALS : "serves"
    TUTORIAL_CONTENTS ||--o{ TUTORIAL_CONCEPTS : "explains"

    USERS {
        uuid id PK
        string email UK
        string display_name
        string avatar_url
        timestamp last_requested_at "Quota_Anchor"
        timestamp created_at
    }

    USER_IDENTITIES {
        uuid id PK
        uuid user_id FK
        string provider "github_or_google"
        string provider_id UK "External_ID"
    }

    PROJECTS {
        uuid id PK
        bigint gh_repo_id UK
        string owner_handle
        string repo_name
        string full_name UK
        string language
        timestamp created_at
    }

    GITHUB_ISSUES {
        uuid id PK
        uuid project_id FK
        int issue_number
        bigint gh_issue_id UK
        jsonb raw_data "Cached_GitHub_JSON"
        timestamp last_synced_at
    }

    TUTORIAL_CONTENTS {
        uuid id PK
        uuid issue_id FK "Unique_per_Issue"
        string title
        text markdown_body "The_AI_Output"
        string status "PENDING_COMPLETED_FAILED"
        timestamp created_at
        timestamp updated_at
    }

    TUTORIALS {
        uuid id PK
        uuid user_id FK
        uuid content_id FK
        boolean is_original_requester
        timestamp created_at
    }

    CONCEPTS {
        uuid id PK
        string slug UK "e-g-message-queues"
        string name
        text description
    }

    CONCEPT_RELATIONSHIPS {
        uuid parent_id FK
        uuid child_id FK
        string rel_type "subconcept_of"
    }

    PROJECT_CONCEPTS {
        uuid project_id FK
        uuid concept_id FK
    }

    TUTORIAL_CONCEPTS {
        uuid content_id FK
        uuid concept_id FK
    }

Core Entities

PROJECTS: GitHub repositories tracked by IssueSight, storing repository metadata (owner, name, language) with unique GitHub repository ID
GITHUB_ISSUES: Issues fetched from GitHub, linked to projects with raw JSONB data (raw_data) containing body, comments, and labels for flexibility
TUTORIAL_CONTENTS: AI-generated context bridges (one per issue via unique issue_id constraint), stored as markdown with status tracking (PENDING, COMPLETED, FAILED)
USERS: User accounts with quota management via last_requested_at timestamp for rate limiting
USER_IDENTITIES: OAuth provider mappings (GitHub, Google) linking external provider IDs to user accounts for multi-provider authentication
TUTORIALS: Junction table tracking which users have unlocked which tutorial contents, with is_original_requester flag
CONCEPTS: Reusable concept definitions (e.g., "message-queues") identified by unique slugs, used for tagging and categorization
CONCEPT_RELATIONSHIPS: Self-referential table enabling hierarchical concept relationships (parent-child) with relationship types like "subconcept_of"
PROJECT_CONCEPTS: Junction table linking projects to concepts for project categorization
TUTORIAL_CONCEPTS: Junction table linking tutorial contents to concepts for content tagging

Key Relationships

One-to-Many:
- USERS → USER_IDENTITIES (users can authenticate via multiple providers)
- USERS → TUTORIALS (users can unlock multiple tutorials)
- PROJECTS → GITHUB_ISSUES (projects contain multiple issues)
- TUTORIAL_CONTENTS → TUTORIALS (one tutorial content can serve multiple users)
- CONCEPTS → PROJECT_CONCEPTS (concepts can tag multiple projects)
- CONCEPTS → TUTORIAL_CONCEPTS (concepts can tag multiple tutorials)
- CONCEPTS → CONCEPT_RELATIONSHIPS (concepts can have parent/child relationships)
One-to-One:
- GITHUB_ISSUES → TUTORIAL_CONTENTS (unique issue_id constraint ensures one tutorial per issue)
Many-to-Many:
- PROJECTS ↔ CONCEPTS (via PROJECT_CONCEPTS junction table)
- TUTORIAL_CONTENTS ↔ CONCEPTS (via TUTORIAL_CONCEPTS junction table)
- CONCEPTS ↔ CONCEPTS (via CONCEPT_RELATIONSHIPS for hierarchical relationships)

This design enables efficient querying, supports concept-based discovery and hierarchical concept organization, maintains data integrity through proper constraints, and allows flexible JSONB storage for volatile GitHub API responses while tracking user access and quota limits.

Key Technical Decisions

Why Redis Streams?

I chose Redis Streams over a simple cron job to decouple the fetching logic from the processing logic. This allows the system to scale independently—if issue volume spikes, I can simply spin up more AI Worker replicas without changing the Collector code.

Why PostgreSQL + JSONB?

GitHub's API response is large and volatile. Instead of strictly normalizing every field, I utilize a Hybrid Schema:

Structured Columns: id, status, difficulty (Indexed for fast lookups/filtering).
JSONB: raw_github_payload (Stored as-is for future flexibility without schema migrations).

Why Go?

Go was selected for its native concurrency primitives (goroutines), which are essential for handling multiple HTTP requests and background stream processing with minimal memory footprint compared to Node.js or Python.

Tech Stack

Component	Technology	Reasoning
Frontend	Next.js 14 (TypeScript, App Router)	Modern React framework with server-side rendering.
Backend	Golang (Gin/Standard Lib)	Strong typing, high performance, native concurrency.
Database	PostgreSQL 16	ACID compliance with JSONB support.
Message Broker	Redis Streams	Lightweight, low-latency event buffering.
Caching	Redis KV	High-speed read access for API endpoints.
AI Layer	OpenAI GPT-5	Context analysis and prerequisite generation.
Infrastructure	Docker Compose	Reproducible local development environment.

Default model is configured via LLM_MODEL=gpt-5. If needed, roll back with LLM_MODEL=gpt-4o.

Project Structure

issuesight/
├── web/                    # Next.js Frontend Service
├── backend/                # Go Microservices
│   ├── gateway/           # API Gateway
│   ├── collector/         # GitHub Issue Collector
│   └── ai-processor/      # AI Content Generator
├── internal/              # Shared Go Packages
│   ├── platform/         # Platform utilities (db, stream, lock)
│   └── domain/           # Shared domain types
└── deployments/           # Docker Compose & Environment Configs

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
backend		backend
deployments		deployments
internal		internal
plans		plans
scripts		scripts
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANUAL_TEST_GUIDE.md		MANUAL_TEST_GUIDE.md
Makefile		Makefile
README.md		README.md
SPRINT_PLAN.md		SPRINT_PLAN.md
er-diagram.png		er-diagram.png
go.mod		go.mod
go.sum		go.sum
issuesight-design.png		issuesight-design.png
issuesight-img.png		issuesight-img.png
security_best_practices_report.md		security_best_practices_report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IssueSight 🔭

The Engineering Goal

System Architecture

Data Flow Breakdown

Key Technical Decisions

Data Model

Core Entities

Key Relationships

Key Technical Decisions

Why Redis Streams?

Why PostgreSQL + JSONB?

Why Go?

Tech Stack

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IssueSight 🔭

The Engineering Goal

System Architecture

Data Flow Breakdown

Key Technical Decisions

Data Model

Core Entities

Key Relationships

Key Technical Decisions

Why Redis Streams?

Why PostgreSQL + JSONB?

Why Go?

Tech Stack

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages