Skip to content

enesyesil/issuesight

Repository files navigation

IssueSight 🔭

Bridging the gap between "Good First Issues" and "Great First Contributions" via AI-driven mentorship.

Go Architecture Database Status


IssueSight Main Page


The Engineering Goal

IssueSight is a distributed, event-driven platform designed to solve a specific problem in the Open Source ecosystem: Context Switching.

Junior engineers often struggle to contribute not because they can't code, but because they lack the domain context of massive repositories. IssueSight ingests GitHub issues and uses LLMs to generate "Context Bridges" from breaking down complex tickets into junior-level prerequisites, architectural summaries, and implementation guides.


System Architecture

The system follows a vertical Microservices Layering pattern in a monorepo structure. Traffic flows from the Next.js Frontend (Client) through the Go Gateway (Center) down to the Persistence Layer (Bottom).

IssueSight Architecture Diagram

---
config:
  theme: neo-dark
---
flowchart TB
 subgraph ClientLayer["1. Client Layer"]
    direction TB
        UserApp("User")
  end
 subgraph GatewayLayer["2. Gateway Layer"]
    direction TB
        APIGateway["API Gateway"]
        AuthMgr["Auth & Quota Manager"]
        LockMgr["Lock Manager"]
  end
 subgraph ExternalLayer["5. External Ecosystem"]
        GitHub("GitHub API")
        LLM("LLM Provider")
  end
 subgraph LogicLayer["3. Logic & Processing Layer"]
    direction TB
        Collector["Collector Worker"]
        AIWorker["AI Generator Worker"]
  end
 subgraph DataLayer["4. Data & State Layer"]
    direction TB
        MongoDB[("MongoDB\nAuth & Quotas")]
        Redis[("Redis Speed Layer\nCache/Locks/Stream")]
        Postgres[("PostgreSQL\nTutorial Archive")]
  end
    UserApp -- "1. Submit Issue / Auth" --> APIGateway
    APIGateway -.-> AuthMgr & LockMgr
    AuthMgr -- "2. Check Limit" --> MongoDB
    LockMgr -- "3. Distributed Lock" --> Redis
    APIGateway -- "4. Enqueue Task" --> Redis
    Collector -- "5. Poll Metadata" --> GitHub
    Collector -- "6. Push Context" --> Redis
    Redis -- "7. Stream Consume" --> AIWorker
    AIWorker -- "8. Generate Content" --> LLM
    AIWorker -- "9. Persist Tutorial" --> Postgres

     UserApp:::client
     APIGateway:::gateway
     AuthMgr:::gateway
     LockMgr:::gateway
     GitHub:::external
     LLM:::external
     Collector:::worker
     AIWorker:::worker
     MongoDB:::data
     Redis:::data
     Postgres:::data
    classDef client fill:#fff3e0,stroke:#f57c00,stroke-width:2px,rx:10,ry:10
    classDef gateway fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,rx:5,ry:5
    classDef worker fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,rx:5,ry:5
    classDef data fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,shape:cyl
    classDef external fill:#eeeeee,stroke:#999999,stroke-width:2px,stroke-dasharray: 5 5,rx:5,ry:5
    style UserApp fill:#00C853,color:#000000
    style APIGateway fill:#2962FF
    style AuthMgr fill:#2962FF
    style LockMgr fill:#2962FF
    style GitHub fill:#2962FF
    style LLM fill:#00C853
    style Collector fill:#2962FF
    style AIWorker fill:#FFD600,color:#000000
    style MongoDB fill:#FF6D00
    style Redis fill:#2962FF
    style Postgres fill:#00C853
    style GatewayLayer stroke:#00C853,fill:#00C853,color:#000000
    style DataLayer fill:#FF6D00,color:#000000
    style LogicLayer fill:#00C853,color:#000000
    style ExternalLayer fill:#FFD600,color:#000000
    style ClientLayer fill:#BBDEFB,color:#000000
    linkStyle 0 stroke:#f57c00,stroke-width:2px,fill:none
    linkStyle 1 stroke:#2962FF,fill:none
    linkStyle 2 stroke:#2962FF,fill:none
    linkStyle 3 stroke:#2962FF,fill:none
    linkStyle 4 stroke:#2962FF,fill:none
    linkStyle 5 stroke:#2962FF,fill:none
    linkStyle 6 stroke:#000000,fill:none
    linkStyle 7 stroke:#000000,fill:none
    linkStyle 8 stroke:#2e7d32,stroke-width:2px,fill:none
    linkStyle 9 stroke:#2e7d32,stroke-width:2px,fill:none
    linkStyle 10 stroke:#2962FF,fill:none
Loading

Data Flow Breakdown

  1. Ingestion (The Write Path - Blue Lines): A background Collector service polls GitHub and pushes raw events to a Redis Stream. This ensures that if the GitHub API is slow or rate-limited, it does not block the rest of the application.
  2. Processing (The Worker): The AI Worker consumes the stream, utilizing OpenAI to analyze the code complexity. It determines if an issue is truly "Junior Friendly" or if it requires advanced knowledge.
  3. Serving (The Read Path - Orange Lines): The API Gateway serves the frontend. It implements a Cache-Aside strategy: popular issues are served from Redis KV memory (<5ms), while the database is only hit on cache misses.

Key Technical Decisions

Data Model

The database schema follows a normalized relational design with PostgreSQL as the primary data store. The ERD below illustrates the core entities and their relationships:

erDiagram
    USERS ||--o{ USER_IDENTITIES : "authenticates_via"
    USERS ||--o{ TUTORIALS : "unlocks"
    
    PROJECTS ||--o{ GITHUB_ISSUES : "contains"
    PROJECTS ||--o{ PROJECT_CONCEPTS : "categorized_by"
    
    GITHUB_ISSUES ||--o| TUTORIAL_CONTENTS : "generates"
    
    CONCEPTS ||--o{ PROJECT_CONCEPTS : "defines"
    CONCEPTS ||--o{ TUTORIAL_CONCEPTS : "tags"
    CONCEPTS ||--o{ CONCEPT_RELATIONSHIPS : "is_parent_of"
    CONCEPTS ||--o{ CONCEPT_RELATIONSHIPS : "is_child_of"
    
    TUTORIAL_CONTENTS ||--o{ TUTORIALS : "serves"
    TUTORIAL_CONTENTS ||--o{ TUTORIAL_CONCEPTS : "explains"

    USERS {
        uuid id PK
        string email UK
        string display_name
        string avatar_url
        timestamp last_requested_at "Quota_Anchor"
        timestamp created_at
    }

    USER_IDENTITIES {
        uuid id PK
        uuid user_id FK
        string provider "github_or_google"
        string provider_id UK "External_ID"
    }

    PROJECTS {
        uuid id PK
        bigint gh_repo_id UK
        string owner_handle
        string repo_name
        string full_name UK
        string language
        timestamp created_at
    }

    GITHUB_ISSUES {
        uuid id PK
        uuid project_id FK
        int issue_number
        bigint gh_issue_id UK
        jsonb raw_data "Cached_GitHub_JSON"
        timestamp last_synced_at
    }

    TUTORIAL_CONTENTS {
        uuid id PK
        uuid issue_id FK "Unique_per_Issue"
        string title
        text markdown_body "The_AI_Output"
        string status "PENDING_COMPLETED_FAILED"
        timestamp created_at
        timestamp updated_at
    }

    TUTORIALS {
        uuid id PK
        uuid user_id FK
        uuid content_id FK
        boolean is_original_requester
        timestamp created_at
    }

    CONCEPTS {
        uuid id PK
        string slug UK "e-g-message-queues"
        string name
        text description
    }

    CONCEPT_RELATIONSHIPS {
        uuid parent_id FK
        uuid child_id FK
        string rel_type "subconcept_of"
    }

    PROJECT_CONCEPTS {
        uuid project_id FK
        uuid concept_id FK
    }

    TUTORIAL_CONCEPTS {
        uuid content_id FK
        uuid concept_id FK
    }
Loading

Core Entities

  • PROJECTS: GitHub repositories tracked by IssueSight, storing repository metadata (owner, name, language) with unique GitHub repository ID
  • GITHUB_ISSUES: Issues fetched from GitHub, linked to projects with raw JSONB data (raw_data) containing body, comments, and labels for flexibility
  • TUTORIAL_CONTENTS: AI-generated context bridges (one per issue via unique issue_id constraint), stored as markdown with status tracking (PENDING, COMPLETED, FAILED)
  • USERS: User accounts with quota management via last_requested_at timestamp for rate limiting
  • USER_IDENTITIES: OAuth provider mappings (GitHub, Google) linking external provider IDs to user accounts for multi-provider authentication
  • TUTORIALS: Junction table tracking which users have unlocked which tutorial contents, with is_original_requester flag
  • CONCEPTS: Reusable concept definitions (e.g., "message-queues") identified by unique slugs, used for tagging and categorization
  • CONCEPT_RELATIONSHIPS: Self-referential table enabling hierarchical concept relationships (parent-child) with relationship types like "subconcept_of"
  • PROJECT_CONCEPTS: Junction table linking projects to concepts for project categorization
  • TUTORIAL_CONCEPTS: Junction table linking tutorial contents to concepts for content tagging

Key Relationships

  • One-to-Many:

    • USERSUSER_IDENTITIES (users can authenticate via multiple providers)
    • USERSTUTORIALS (users can unlock multiple tutorials)
    • PROJECTSGITHUB_ISSUES (projects contain multiple issues)
    • TUTORIAL_CONTENTSTUTORIALS (one tutorial content can serve multiple users)
    • CONCEPTSPROJECT_CONCEPTS (concepts can tag multiple projects)
    • CONCEPTSTUTORIAL_CONCEPTS (concepts can tag multiple tutorials)
    • CONCEPTSCONCEPT_RELATIONSHIPS (concepts can have parent/child relationships)
  • One-to-One:

    • GITHUB_ISSUESTUTORIAL_CONTENTS (unique issue_id constraint ensures one tutorial per issue)
  • Many-to-Many:

    • PROJECTSCONCEPTS (via PROJECT_CONCEPTS junction table)
    • TUTORIAL_CONTENTSCONCEPTS (via TUTORIAL_CONCEPTS junction table)
    • CONCEPTSCONCEPTS (via CONCEPT_RELATIONSHIPS for hierarchical relationships)

This design enables efficient querying, supports concept-based discovery and hierarchical concept organization, maintains data integrity through proper constraints, and allows flexible JSONB storage for volatile GitHub API responses while tracking user access and quota limits.


Key Technical Decisions

Why Redis Streams?

I chose Redis Streams over a simple cron job to decouple the fetching logic from the processing logic. This allows the system to scale independently—if issue volume spikes, I can simply spin up more AI Worker replicas without changing the Collector code.

Why PostgreSQL + JSONB?

GitHub's API response is large and volatile. Instead of strictly normalizing every field, I utilize a Hybrid Schema:

  • Structured Columns: id, status, difficulty (Indexed for fast lookups/filtering).
  • JSONB: raw_github_payload (Stored as-is for future flexibility without schema migrations).

Why Go?

Go was selected for its native concurrency primitives (goroutines), which are essential for handling multiple HTTP requests and background stream processing with minimal memory footprint compared to Node.js or Python.


Tech Stack

Component Technology Reasoning
Frontend Next.js 14 (TypeScript, App Router) Modern React framework with server-side rendering.
Backend Golang (Gin/Standard Lib) Strong typing, high performance, native concurrency.
Database PostgreSQL 16 ACID compliance with JSONB support.
Message Broker Redis Streams Lightweight, low-latency event buffering.
Caching Redis KV High-speed read access for API endpoints.
AI Layer OpenAI GPT-5 Context analysis and prerequisite generation.
Infrastructure Docker Compose Reproducible local development environment.

Default model is configured via LLM_MODEL=gpt-5. If needed, roll back with LLM_MODEL=gpt-4o.


Project Structure

issuesight/
├── web/                    # Next.js Frontend Service
├── backend/                # Go Microservices
│   ├── gateway/           # API Gateway
│   ├── collector/         # GitHub Issue Collector
│   └── ai-processor/      # AI Content Generator
├── internal/              # Shared Go Packages
│   ├── platform/         # Platform utilities (db, stream, lock)
│   └── domain/           # Shared domain types
└── deployments/           # Docker Compose & Environment Configs

About

IssueSight ingests GitHub issues and uses LLMs to generate "Context Bridges" from breaking down complex issues into junior-level prerequisites, architectural summaries, and implementation guides.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors