Skip to content

Foundation: Storage + Config System #52

@prosdev

Description

@prosdev

🎯 Overview

Implement centralized storage for repository indexes and project-level configuration system. This foundation enables clean project directories, cross-repository search, and per-project adapter configuration.

Part of Epic: #31

💾 Storage Strategy: Centralized Indexes

Overview

All repository indexes are stored globally in ~/.dev-agent/indexes/ rather than per-project. This enables:

  • ✅ Clean project directories (no large files to gitignore)
  • ✅ Cross-repository search and analysis
  • ✅ Shared indexes across clones
  • ✅ Easy storage management and cleanup
  • ✅ Survives project moves/deletions

Directory Structure

~/.dev-agent/
  indexes/
    a1b2c3d4/              # Frontend (git remote hash)
      vectors.lance        # Code vectors
      github-state.json    # GitHub cache
      metadata.json        # Repository metadata
    e5f6g7h8/              # Backend
      vectors.lance
      github-state.json
      metadata.json
  
  cache/
    embeddings-model/      # Shared ML model (~100MB)
    github-api/            # GitHub API response cache
  
  config/
    global.json            # Global settings (optional)

Storage Location Algorithm

function getStoragePath(repositoryPath: string): string {
  // 1. Try git remote (stable across clones)
  const gitRemote = await getGitRemote(repositoryPath);
  
  if (gitRemote) {
    // Normalize: git@github.com:company/repo.git → company/repo
    const normalized = normalizeGitRemote(gitRemote);
    const hash = crypto.createHash('md5')
      .update(normalized)
      .digest('hex')
      .slice(0, 8);
    
    return path.join(os.homedir(), '.dev-agent/indexes', hash);
  }
  
  // 2. Fallback: absolute path hash (for non-git repos)
  const pathHash = crypto.createHash('md5')
    .update(path.resolve(repositoryPath))
    .digest('hex')
    .slice(0, 8);
  
  return path.join(os.homedir(), '.dev-agent/indexes', pathHash);
}

Metadata File

Each index includes metadata for identification:

// ~/.dev-agent/indexes/a1b2c3d4/metadata.json
{
  "version": "1.0",
  "repository": {
    "path": "/Users/you/workspace/frontend",
    "remote": "git@github.com:company/frontend.git",
    "branch": "main",
    "lastCommit": "abc123..."
  },
  "indexed": {
    "timestamp": "2025-11-25T12:00:00Z",
    "files": 243,
    "components": 1847,
    "size": 52428800
  },
  "config": {
    "languages": ["typescript", "javascript"],
    "excludePatterns": ["**/node_modules/**"]
  }
}

🔧 Configuration System

Config File: .dev-agent/config.json

{
  "version": "1.0",
  
  "repository": {
    "path": ".",
    "excludePatterns": ["**/node_modules/**", "**/dist/**"],
    "languages": ["typescript", "javascript"]
  },
  
  "mcp": {
    "adapters": {
      "search": { "enabled": true },
      "github": { "enabled": true },
      "plan": { "enabled": true },
      "explore": { "enabled": true },
      "status": { "enabled": false }
    }
  }
}

Config Schema

interface DevAgentConfig {
  version: string;
  repository: {
    path?: string;
    excludePatterns?: string[];
    languages?: string[];
  };
  mcp?: {
    adapters?: Record<string, AdapterConfig>;
  };
}

interface AdapterConfig {
  enabled: boolean;
  source?: string;  // For custom adapters
  settings?: Record<string, string | number | boolean>;
}

Environment Variable Templating

Support ${VAR_NAME} syntax in config:

{
  "mcp": {
    "adapters": {
      "jira": {
        "settings": {
          "apiKey": "${JIRA_API_KEY}"
        }
      }
    }
  }
}

🧠 Memory Management

Lazy Loading

class MCPServer {
  private indexer?: RepositoryIndexer;
  private lastAccessed = Date.now();
  private readonly IDLE_TIMEOUT = 5 * 60 * 1000; // 5 minutes
  
  async ensureIndexer(): Promise<RepositoryIndexer> {
    if (!this.indexer) {
      // Lazy load on first use
      const storagePath = getStoragePath(this.repositoryPath);
      
      this.indexer = new RepositoryIndexer({
        repositoryPath: this.repositoryPath,
        vectorStorePath: path.join(storagePath, 'vectors.lance'),
      });
      
      await this.indexer.initialize();
      this.logger.info('Loaded indexes', { storagePath });
    }
    
    this.lastAccessed = Date.now();
    return this.indexer;
  }
  
  // Auto-unload after idle period
  startIdleMonitor() {
    setInterval(() => {
      const idleTime = Date.now() - this.lastAccessed;
      
      if (idleTime > this.IDLE_TIMEOUT && this.indexer) {
        this.indexer.close();
        this.indexer = undefined;
        this.logger.info('Unloaded indexes (idle timeout)', {
          idleMinutes: Math.floor(idleTime / 60000)
        });
      }
    }, 60000); // Check every minute
  }
}

📋 Implementation Tasks

Storage System

  • Implement getStoragePath() function (git remote → hash)
  • Create storage directory structure on first use
  • Implement metadata.json creation/updates
  • Update RepositoryIndexer to accept storage path parameter
  • Implement lazy loading in MCP server
  • Add idle timeout and auto-unload mechanism
  • Handle storage path resolution errors gracefully

Configuration System

  • Define DevAgentConfig TypeScript interface
  • Implement loadConfig() function with validation
  • Add environment variable templating (${VAR_NAME})
  • Create config file template/defaults
  • Update packages/cli/src/utils/config.ts to use new schema
  • Add config validation with helpful error messages
  • Support config file merging (defaults + user config)

Migration Path

  • Detect existing project-local indexes
  • Implement dev-agent storage migrate command
  • Move indexes to centralized location
  • Update configs to reference new storage paths
  • Clean up old local indexes (with confirmation)

✅ Acceptance Criteria

  • Config loads from .dev-agent/config.json with validation
  • Indexes stored in ~/.dev-agent/indexes/{hash}/ based on git remote
  • Metadata.json created/updated for each index
  • Lazy loading works - indexer only loads on first tool call
  • Auto-unload works - indexer unloads after 5 minutes idle
  • Environment variables resolved in config (${VAR_NAME})
  • Migration command successfully moves existing indexes
  • Storage path falls back to path hash for non-git repos
  • Config validation provides helpful error messages

🧪 Testing

  • Unit tests for getStoragePath() (git remote, fallback)
  • Unit tests for config loading/validation
  • Unit tests for environment variable templating
  • Integration test for lazy loading
  • Integration test for auto-unload
  • Integration test for migration path

🔗 Dependencies

Estimate: 1-1.5 days
Priority: High (foundation for other sub-issues)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions