-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description
Implement a repository scanner using a Hybrid Architecture that combines 'tree-sitter' for universal multi-language support and 'ts-morph' for enhanced TypeScript/JavaScript analysis. This ensures we have broad coverage (Go, Python, Rust) while maintaining deep semantic understanding for our primary language.
Acceptance Criteria
- Architecture: Implement a 'ScannerRegistry' and 'LanguageScanner' interface to support pluggable scanners.
- Universal Support: Implement a base 'TreeSitterScanner' that supports Go, Python, and Rust (syntax only).
- Enhanced Support: Implement 'TypeScriptScanner' using 'ts-morph' for deep analysis (types, references) of TS/JS files.
- Documentation: Implement 'MarkdownScanner' using 'remark' for READMEs and docs.
- Output: All scanners produce unified 'Document' interface with metadata (language, type, start/end lines).
- Auto-detection: Registry automatically selects the best scanner based on file extension.
- Capabilities: Each scanner exposes its capabilities (syntax, types, references, documentation).
Document Interface
interface Document {
id: string; // Unique identifier (file:name:line)
text: string; // Text to embed
type: DocumentType; // 'function' | 'class' | 'interface' | 'struct' | 'doc'
language: string; // 'typescript' | 'go' | 'python' | 'markdown'
metadata: {
file: string; // Relative path from repo root
startLine: number;
endLine: number;
name?: string; // Symbol name (function/class)
signature?: string; // Full signature
exported: boolean; // Public API?
docstring?: string; // Doc comments/JSDoc
};
}
interface ScannerCapabilities {
syntax: boolean; // Basic structure extraction
types?: boolean; // Type information
references?: boolean; // Cross-file references
documentation?: boolean; // Doc comment extraction
}Technical Requirements
- Use 'tree-sitter' (Node.js bindings, NOT web-tree-sitter) for performance
- Integrate standard tree-sitter grammars (typescript, go, python, rust)
- Use 'ts-morph' for enhanced TypeScript extraction
- Use 'remark' ecosystem for parsing Markdown
- Implement efficient file traversal and language detection
- Support configurable inclusion/exclusion patterns
- Add comprehensive tests for each scanner implementation
Branch: feat/repository-scanner
Priority: High
Estimate: 5 days
Parent Epic: #1
Metadata
Metadata
Assignees
Labels
No labels