Skip to content

Implement GitHub Context Subagent #10

@prosdev

Description

@prosdev

Description

Implement a GitHub Context Subagent that indexes GitHub data (issues, PRs, discussions) and makes it semantically searchable. This extends dev-agent's context provision capabilities from code files to GitHub metadata, enabling AI assistants to have complete project context.

Core Mission Alignment

Goal: Provide relevant context to AI tools, reducing hallucinations

Current: We index code files and documentation
Missing: GitHub issues, PRs, discussions, comments
Solution: Index GitHub data like we index code

Acceptance Criteria

GitHub Data Indexing:

  • Fetch issues via GitHub CLI (gh)
  • Fetch PRs with metadata (commits, reviews, comments)
  • Index issue/PR descriptions as vector embeddings
  • Store GitHub metadata (labels, status, timestamps, links)
  • Support incremental updates (only fetch changed data)

Semantic Search:

  • Search issues: dev gh search "authentication bug"
  • Search PRs: dev gh search --type pr "performance"
  • Find related issues: dev gh related --issue 42
  • Combined search (code + GitHub): dev search "oauth" --include-github

Context Provision:

  • Get issue context: dev gh context --issue 42
  • Returns: issue data, related PRs, linked code files, discussions
  • Integrate with Planner (better issue understanding)
  • Integrate with Explorer (find related issues to code patterns)

General:

  • Integrates with subagent coordinator
  • Uses existing vector storage (LanceDB)
  • Respects .gitignore and privacy settings
  • Handles rate limiting gracefully

Architecture

┌─────────────────────────────────────────────────────────┐
│                     dev-agent                            │
├─────────────────────────────────────────────────────────┤
│ Code Scanner    → Indexer → Vector Store (LanceDB)      │
│ GitHub Fetcher  → Indexer → Vector Store (LanceDB)      │
│                              ↓                           │
│                     Semantic Search                      │
│                    (code + GitHub)                       │
└─────────────────────────────────────────────────────────┘

Use Cases

1. Issue Context for AI

dev gh context --issue 42

# Returns full context:
# - Issue description
# - Related issues (#35, #28)
# - Related PRs (#40)
# - Affected code files (via links/mentions)
# - Discussion threads

2. Enhanced Planning

dev plan 42

# Planner now has access to:
# - Full issue description (not just gh CLI output)
# - Related issues for context
# - Previous similar work (from closed issues/PRs)

3. Pattern Discovery

dev explore pattern "error handling"

# Explorer finds:
# - Code patterns
# - Related issues discussing error handling
# - PRs that improved error handling

4. Knowledge Base

dev gh search "how do we handle rate limiting"

# Searches:
# - Issue discussions
# - PR descriptions
# - Code comments
# - Documentation

CLI Commands

# Indexing
dev gh index                     # Index all GitHub data
dev gh index --since 2024-01-01  # Incremental update
dev gh update                    # Refresh changed items

# Searching
dev gh search "query"            # Search all GitHub data
dev gh search "query" --type issue
dev gh search "query" --type pr

# Context
dev gh context --issue 42        # Get full context for issue
dev gh related --issue 42        # Find related issues/PRs

# Stats
dev gh stats                     # Show indexed data stats

Technical Implementation

Phase 1: GitHub Fetcher (Day 1)

  • Use gh CLI to fetch issues and PRs
  • Parse JSON output into structured types
  • Handle pagination and rate limits
  • Store raw data with metadata

Phase 2: Indexing (Day 1-2)

  • Extract text from issues/PRs for embedding
  • Generate vectors using Transformers.js (same as code)
  • Store in LanceDB with GitHub-specific metadata
  • Implement incremental updates

Phase 3: Search & Context (Day 2)

  • Semantic search over GitHub data
  • Context assembly (issue + related items)
  • Integration with existing search

Phase 4: Agent Integration (Day 3)

  • Expose via Subagent Coordinator
  • Integrate with Planner agent
  • Integrate with Explorer agent
  • Message-based communication

Data Model

interface GitHubDocument {
  type: 'issue' | 'pr' | 'discussion';
  number: number;
  title: string;
  body: string;
  state: 'open' | 'closed';
  labels: string[];
  author: string;
  createdAt: string;
  updatedAt: string;
  comments: number;
  relatedIssues: number[];  // Extracted from links
  relatedPRs: number[];     // Extracted from links
  linkedFiles: string[];    // Mentioned in issue/PR
}

Dependencies

Success Metrics

  • AI assistants can find relevant issues/PRs for any query
  • Planner generates better task breakdowns with GitHub context
  • Explorer discovers cross-cutting concerns (code + issues)
  • Developers spend less time searching GitHub manually

Future Enhancements

  • Index PR review comments
  • Index discussion threads
  • Track issue relationships (blocks/blocked-by)
  • Temporal analysis (issue trends over time)
  • Integration with GitHub Projects

Branch: feat/github-context-subagent
Priority: High (enables better AI assistance)
Estimate: 3 days
Parent Epic: #1 (Core Context Provider)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions