Skip to content

perf: exclude apm_modules from compilation scanning and cache Set[Path]#157

Merged
danielmeppiel merged 3 commits intomicrosoft:mainfrom
sergio-sisternes-epam:fix/154-compilation-performance
Mar 4, 2026
Merged

perf: exclude apm_modules from compilation scanning and cache Set[Path]#157
danielmeppiel merged 3 commits intomicrosoft:mainfrom
sergio-sisternes-epam:fix/154-compilation-performance

Conversation

@sergio-sisternes-epam
Copy link
Collaborator

Description

apm compile scans the entire project tree during the optimization phase, including apm_modules/. On projects with large transitive dependencies, this causes Instruction Processing to take hundreds of seconds due to the O(n×m×k) pattern matching loop amplified by hundreds of extra directories.

This PR:

  1. Adds apm_modules to the hardcoded exclusion list in _analyze_project_structure(), _should_exclude_subdir(), and _get_all_files() — consistent with existing node_modules exclusion.
  2. Caches the Set[Path] conversion in _file_matches_pattern() to avoid recreating large sets on every call.

Fixes #154

Type of change

  • Bug fix
  • New feature
  • Documentation
  • Maintenance / refactor

Testing

  • Tested locally
  • All existing tests pass
  • Added tests for new functionality (if applicable)

New test classes (7 tests):

  • TestApmModulesExclusion (4 tests):
    • test_apm_modules_excluded_from_directory_cache — verifies no apm_modules paths leak into the cache
    • test_cache_size_unaffected_by_apm_modules — asserts cache size reflects only project dirs
    • test_os_walk_prunes_apm_modules — confirms _should_exclude_subdir() flags apm_modules
    • test_find_matching_dirs_ignores_apm_modules — verifies pattern matching skips apm_modules contents
  • TestGlobCacheReuse (1 test):
    • test_set_path_cached_across_calls — confirms Set[Path] is created once and reused
  • TestExpandGlobPattern (1 additional test from [BUG] Compilation fails on applyTo patterns with multiple brace groups #153 follow-up):
    • test_three_brace_groups — validates three nested brace groups expand correctly

All 43 tests pass.

Local benchmark on a project with transitive APM dependencies:

Metric Before After Improvement
Project Analysis 8.6ms 5.9ms ~1.3x
Instruction Processing 821,231ms (~13.7 min) 16,886ms (~17s) ~49x
Total Generation Time 1,095,937ms (~18.3 min) 17,555ms (~17.6s) ~62x
Directories scanned 90 78 -12

Add apm_modules to the hardcoded exclusion list in _analyze_project_structure(),
_should_exclude_subdir(), and _get_all_files() so that installed dependency trees
are not scanned during compilation.

Cache the Set[Path] conversion in _file_matches_pattern() to avoid recreating
large sets on every call.

Before: ~821s instruction processing, ~1096s total on a project with apm_modules.
After:  ~17s instruction processing, ~18s total (~62x improvement).

Fixes microsoft#154
Copilot AI review requested due to automatic review settings March 4, 2026 15:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves apm compile performance by reducing filesystem scanning work during context optimization and by reusing cached data structures for glob matching. It also hardens brace-expansion for applyTo patterns with multiple brace groups (fixing #153) and avoids scanning installed dependencies under apm_modules/ (fixing #154).

Changes:

  • Exclude apm_modules/ from project scanning in the optimizer (similar to node_modules/).
  • Make _expand_glob_pattern() recursively expand multiple brace groups.
  • Cache Set[Path] conversions for glob results to avoid repeated set construction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/apm_cli/compilation/context_optimizer.py Adds apm_modules to exclusion logic; improves brace expansion; introduces set-caching for glob matches.
tests/unit/compilation/test_context_optimizer.py Adds unit tests covering brace expansion, apm_modules exclusion, and set-cache reuse behavior.

You can also share your feedback on Copilot code review. Take the survey.

- Extract DEFAULT_EXCLUDED_DIRNAMES frozenset constant to eliminate
  duplication across _analyze_project_structure, _should_exclude_subdir,
  and _get_all_files (review comment 1)
- Use dedicated _glob_set_cache: Dict[str, Set[Path]] instead of
  overloading _glob_cache with '_set_' prefixed keys (review comment 2)
- Update docs/cli-reference.md and docs/compilation.md to list
  apm_modules in default exclusions (review comment 4)
@sergio-sisternes-epam sergio-sisternes-epam added the bug Something isn't working label Mar 4, 2026
@sergio-sisternes-epam sergio-sisternes-epam added this to the 0.8.0 milestone Mar 4, 2026
@danielmeppiel danielmeppiel merged commit 8a3e0aa into microsoft:main Mar 4, 2026
6 checks passed
sergio-sisternes-epam added a commit to sergio-sisternes-epam/apm that referenced this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Compilation extremely slow on projects with apm_modules dependencies

3 participants