-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Describe the bug
apm compile takes a very long time on projects that have APM dependencies installed under apm_modules/. The main contributor appears to be apm_modules/ not being excluded from the directory scanning phase, combined with an expensive pattern matching loop that amplifies the issue.
To Reproduce
- Create a project that depends on a large APM package (e.g., a team package that transitively pulls in many skills)
- Run
apm installto populateapm_modules/ - Run
apm compile --verbose - Observe compilation takes significantly longer than expected
Expected behavior
Compilation should complete in seconds. The apm_modules/ directory contains installed dependency source and should ideally be excluded from filesystem scanning during compilation, similar to how node_modules/ is already excluded.
Environment (please complete the following information):
- OS: macOS
- Python Version: 3.13.11
- APM Version: 0.7.4
- VSCode Version: N/A (CLI only)
Logs
⚙️ Starting context compilation...
Compiling for AGENTS.md (VSCode/Copilot) - detected .github/ folder
Verbose mode: showing source attribution and optimizer analysis
⏱️ 📊 Project Analysis: 8.6ms
⏱️ 🎯 Instruction Processing: 821231.2ms
Analyzing project structure...
├─ 90 directories scanned (max depth: 6)
├─ 369 files analyzed across 18 file types
└─ 10 instruction patterns detected
...
Generated 1 AGENTS.md file
┌─ Context efficiency: 71.0%
└─ Generation time: 1095937ms
Placement Distribution
└─ . 10 instructions from 10 sources
✅ Compilation completed successfully!
Project Analysis completes in ~9ms, but Instruction Processing takes ~821 seconds. Total generation time is ~1,096 seconds. The bottleneck is in the pattern matching phase scanning through apm_modules/ contents.
Additional context
Some investigation into src/apm_cli/compilation/context_optimizer.py surfaced a few things that seem to be contributing — sharing in case it's helpful:
1. apm_modules/ not in default exclusion list
_analyze_project_structure() hardcodes exclusions for node_modules, __pycache__, .git, dist, build — but not apm_modules/. When a project has large transitive dependencies (e.g., a team package pulling in multiple squads and skills), this adds hundreds of extra directories to the scan.
2. O(n×m×k) pattern matching in _find_matching_directories()
For each instruction pattern, the method iterates every cached directory, then every file in each directory, calling _file_matches_pattern() per file. With lots of directories from apm_modules/, this gets expensive quickly.
3. Set[Path] recreation on every match check
_file_matches_pattern() converts cached glob results from List[str] to Set[Path] on every call rather than caching the converted set. This creates and discards large sets tens of thousands of times.
4. os.walk doesn't prune hardcoded exclusions
_analyze_project_structure() calls continue when it encounters hardcoded exclusion directories, but doesn't modify dirs[:] to prevent os.walk() from descending into those subtrees.
Just adding apm_modules to the default exclusion list would likely make the biggest difference. Happy to help with a PR if that'd be useful!