Skip to content

[BUG] Compilation extremely slow on projects with apm_modules dependencies #154

@sergio-sisternes-epam

Description

@sergio-sisternes-epam

Describe the bug

apm compile takes a very long time on projects that have APM dependencies installed under apm_modules/. The main contributor appears to be apm_modules/ not being excluded from the directory scanning phase, combined with an expensive pattern matching loop that amplifies the issue.

To Reproduce

  1. Create a project that depends on a large APM package (e.g., a team package that transitively pulls in many skills)
  2. Run apm install to populate apm_modules/
  3. Run apm compile --verbose
  4. Observe compilation takes significantly longer than expected

Expected behavior

Compilation should complete in seconds. The apm_modules/ directory contains installed dependency source and should ideally be excluded from filesystem scanning during compilation, similar to how node_modules/ is already excluded.

Environment (please complete the following information):

  • OS: macOS
  • Python Version: 3.13.11
  • APM Version: 0.7.4
  • VSCode Version: N/A (CLI only)

Logs

⚙️ Starting context compilation...
Compiling for AGENTS.md (VSCode/Copilot) - detected .github/ folder
Verbose mode: showing source attribution and optimizer analysis
⏱️  📊 Project Analysis: 8.6ms
⏱️  🎯 Instruction Processing: 821231.2ms
Analyzing project structure...
├─ 90 directories scanned (max depth: 6)
├─ 369 files analyzed across 18 file types
└─ 10 instruction patterns detected
...
Generated 1 AGENTS.md file
┌─ Context efficiency:    71.0%
└─ Generation time:       1095937ms

Placement Distribution
└─ .                              10 instructions from 10 sources
✅ Compilation completed successfully!

Project Analysis completes in ~9ms, but Instruction Processing takes ~821 seconds. Total generation time is ~1,096 seconds. The bottleneck is in the pattern matching phase scanning through apm_modules/ contents.

Additional context

Some investigation into src/apm_cli/compilation/context_optimizer.py surfaced a few things that seem to be contributing — sharing in case it's helpful:

1. apm_modules/ not in default exclusion list

_analyze_project_structure() hardcodes exclusions for node_modules, __pycache__, .git, dist, build — but not apm_modules/. When a project has large transitive dependencies (e.g., a team package pulling in multiple squads and skills), this adds hundreds of extra directories to the scan.

2. O(n×m×k) pattern matching in _find_matching_directories()

For each instruction pattern, the method iterates every cached directory, then every file in each directory, calling _file_matches_pattern() per file. With lots of directories from apm_modules/, this gets expensive quickly.

3. Set[Path] recreation on every match check

_file_matches_pattern() converts cached glob results from List[str] to Set[Path] on every call rather than caching the converted set. This creates and discards large sets tens of thousands of times.

4. os.walk doesn't prune hardcoded exclusions

_analyze_project_structure() calls continue when it encounters hardcoded exclusion directories, but doesn't modify dirs[:] to prevent os.walk() from descending into those subtrees.

Just adding apm_modules to the default exclusion list would likely make the biggest difference. Happy to help with a PR if that'd be useful!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds-triageNew issue, not yet reviewed by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions