Skip to content

Implement Repository Scanner with ts-morph and Remark #3

@prosdev

Description

@prosdev

Description

Implement a repository scanner using a Hybrid Architecture that combines 'tree-sitter' for universal multi-language support and 'ts-morph' for enhanced TypeScript/JavaScript analysis. This ensures we have broad coverage (Go, Python, Rust) while maintaining deep semantic understanding for our primary language.

Acceptance Criteria

  • Architecture: Implement a 'ScannerRegistry' and 'LanguageScanner' interface to support pluggable scanners.
  • Universal Support: Implement a base 'TreeSitterScanner' that supports Go, Python, and Rust (syntax only).
  • Enhanced Support: Implement 'TypeScriptScanner' using 'ts-morph' for deep analysis (types, references) of TS/JS files.
  • Documentation: Implement 'MarkdownScanner' using 'remark' for READMEs and docs.
  • Output: All scanners produce unified 'Document' interface with metadata (language, type, start/end lines).
  • Auto-detection: Registry automatically selects the best scanner based on file extension.
  • Capabilities: Each scanner exposes its capabilities (syntax, types, references, documentation).

Document Interface

interface Document {
  id: string;              // Unique identifier (file:name:line)
  text: string;            // Text to embed
  type: DocumentType;      // 'function' | 'class' | 'interface' | 'struct' | 'doc'
  language: string;        // 'typescript' | 'go' | 'python' | 'markdown'
  
  metadata: {
    file: string;          // Relative path from repo root
    startLine: number;
    endLine: number;
    name?: string;         // Symbol name (function/class)
    signature?: string;    // Full signature
    exported: boolean;     // Public API?
    docstring?: string;    // Doc comments/JSDoc
  };
}

interface ScannerCapabilities {
  syntax: boolean;         // Basic structure extraction
  types?: boolean;         // Type information
  references?: boolean;    // Cross-file references
  documentation?: boolean; // Doc comment extraction
}

Technical Requirements

  • Use 'tree-sitter' (Node.js bindings, NOT web-tree-sitter) for performance
  • Integrate standard tree-sitter grammars (typescript, go, python, rust)
  • Use 'ts-morph' for enhanced TypeScript extraction
  • Use 'remark' ecosystem for parsing Markdown
  • Implement efficient file traversal and language detection
  • Support configurable inclusion/exclusion patterns
  • Add comprehensive tests for each scanner implementation

Branch: feat/repository-scanner
Priority: High
Estimate: 5 days
Parent Epic: #1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions