Skip to content

[FEATURE] Performance optimization for deep dependency trees #171

@sergio-sisternes-epam

Description

@sergio-sisternes-epam

Is your feature request related to a problem? Please describe.

As APM adoption grows and packages develop deeper transitive dependency trees, several algorithmic bottlenecks in the install and compile paths become significant:

Bottleneck Location Current Complexity Impact
Primitive conflict detection primitives/models.py _add_with_conflict_detection() O(m²) — linear list scan per add Scales quadratically with primitive count
Cycle detection path tracking deps/apm_resolver.py detect_circular_dependencies() O(V×D)in operator on List[str] Degrades with tree depth
get_nodes_at_depth() during flatten deps/dependency_graph.py O(V × max_depth) — full node scan per depth level Repeated scans across 50 depth levels
from_apm_yml() YAML parsing models/apm_package.py 25 call sites, 0 caching — same files parsed 20-50× per operation Unnecessary disk I/O and CPU
get_config() reads config.py 5-10 disk reads per operation, no caching Redundant file I/O
read_constitution() compilation/constitution.py Read 3× per compile from claude_formatter.py and injector.py Redundant file I/O
Inheritance chain computation compilation/context_optimizer.py _get_inheritance_chain() Recomputed every call, no caching Redundant O(D) walks per instruction
Package downloads cli.py install loop Fully sequential — O(N×T) wall-clock 10 pkgs × 5s = 50s vs ~10s parallel
GitHub API resilience deps/github_downloader.py No 429/rate-limit handling Fails immediately under throttling

What works well today: BFS tree building uses deque + Set for O(1) dedup (queued_keys), glob pattern results are cached in ContextOptimizer, flattening uses Dict-based FlatDependencyMap with O(1) lookups, and output assembly uses list.append() + str.join() (not string concatenation). These should be preserved.

Describe the solution you'd like

A phased performance improvement plan with no breaking changes. All optimizations are internal — same inputs produce identical outputs. Lock file format v1 preserved.

Phase 1 — Performance Baseline (Benchmarks & Fixtures)

Entry gate — must complete before optimization work begins.

  • Add pytest-benchmark test suite in tests/benchmarks/ covering: dependency resolution (50+ packages, 5 depth levels), compilation (200+ primitives, 100+ directories), primitive conflict detection (500+ primitives), from_apm_yml() parse throughput, inheritance chain computation
  • Add deep dependency tree synthetic fixtures in tests/fixtures/: 100 packages, 10 depth levels, diamond dependencies, large primitive counts
  • Record baseline measurements in tests/benchmarks/BASELINE.md

Phase 2 — Data Structure Fixes (High Impact, Low Risk)

Can proceed in parallel with Phase 3.

  • Conflict detection O(m²) → O(m): Add Dict[str, int] name→index alongside each primitive list in PrimitiveCollection. Check dict (O(1)) instead of scanning for i, existing in enumerate(collection). File: src/apm_cli/primitives/models.py
  • Cycle detection O(V×D) → O(V+E): Add companion current_path_set: Set[str] in detect_circular_dependencies(). Use set for O(1) in checks, keep list only for cycle-path reporting. File: src/apm_cli/deps/apm_resolver.py
  • Depth-index for flatten: Pre-compute Dict[int, List[DependencyNode]] in DependencyTree on node insertion. get_nodes_at_depth() becomes O(1). File: src/apm_cli/deps/dependency_graph.py

Phase 3 — Caching Layer (High Impact, Medium Risk)

Can proceed in parallel with Phase 2.

  • from_apm_yml() parse cache: Module-level Dict[Path, APMPackage] keyed by resolved absolute path. No invalidation needed within a CLI invocation. Add clear_apm_yml_cache() for test isolation. File: src/apm_cli/models/apm_package.py
  • get_config() cache: Read-once-per-process module-level cache. File: src/apm_cli/config.py
  • read_constitution() cache: Cache after first read. File: src/apm_cli/compilation/constitution.py
  • Inheritance chain cache: Dict[Path, List[Path]] in ContextOptimizer, populated on first call per directory. File: src/apm_cli/compilation/context_optimizer.py

Phase 4 — Network Parallelization (Highest Wall-Clock Impact, Higher Risk)

Depends on Phases 2-3 stability.

  • Parallel package downloads: concurrent.futures.ThreadPoolExecutor (default 4 workers) for the flat install list after dependency resolution. Thread-safe Rich progress bars, collect-all-errors pattern, cap at 4-6 concurrent clones. Files: src/apm_cli/cli.py, src/apm_cli/deps/github_downloader.py
  • Git sparse-checkout for subdirectory packages: Use git sparse-checkout (git 2.25+) with full-clone fallback for virtual subdirectory packages. File: src/apm_cli/deps/github_downloader.py
  • (Stretch) Pre-fetch next BFS depth level in parallel during transitive resolution

Phase 5 — Resilience & Robustness (Medium Impact, Low Risk)

Independent of other phases.

  • Rate limit handling: Inspect X-RateLimit-Remaining and Retry-After headers. Exponential backoff with max 3 retries for 429/503. File: src/apm_cli/deps/github_downloader.py
  • Skip-if-exists optimization: When lockfile has a commit SHA and local clone HEAD matches, skip re-download. Files: src/apm_cli/deps/github_downloader.py, src/apm_cli/cli.py

Describe alternatives you've considered

  • Async/await rewrite: Full migration to asyncio for network I/O. Rejected as too invasive for the current stage — ThreadPoolExecutor achieves most of the parallel download benefit with far less refactoring.
  • Database-backed caching (SQLite): Persistent cross-invocation cache for parsed manifests. Rejected as over-engineering — CLI invocations are short-lived, and simple in-memory caches per process are sufficient.
  • CDN-based package hosting: Pre-built package tarballs on a CDN instead of git clones. Deferred — requires infrastructure outside APM's scope and git-based resolution works well for the current package scale.
  • npm-style global cache directory: A ~/.apm/cache/ shared across all projects. Considered for Phase 5 skip-if-exists but deferred — lockfile-based local dedup handles the common case.

Additional context

Scope boundaries:

  • All changes are internal — no CLI flag changes (except potentially --parallel-downloads N in Phase 4), no API changes, no lock file format changes.
  • Backward compatibility is a hard requirement: identical inputs must produce byte-identical outputs.
  • Phase 1 benchmarks serve as the regression gate for all subsequent phases.

Dependency graph for phases:

Phase 1 (baseline) ──┬──> Phase 2 (data structures) ──┬──> Phase 4 (parallelization)
                      └──> Phase 3 (caching)     ──────┘
                                                        Phase 5 (resilience) [independent]

Key files involved:

File Concern
src/apm_cli/primitives/models.py O(m²) conflict detection
src/apm_cli/deps/apm_resolver.py Cycle detection, BFS builder, flatten
src/apm_cli/deps/dependency_graph.py get_nodes_at_depth(), tree structures
src/apm_cli/models/apm_package.py Uncached from_apm_yml() (25 call sites)
src/apm_cli/config.py Uncached get_config()
src/apm_cli/compilation/constitution.py Uncached read_constitution()
src/apm_cli/compilation/context_optimizer.py Uncached inheritance chains
src/apm_cli/cli.py Sequential install loop
src/apm_cli/deps/github_downloader.py No parallelism, no rate limits

Design constraint — cache invalidation: The "no invalidation within a process" strategy is intentional. APM is a short-lived CLI tool; files don't change mid-operation. If APM ever becomes a long-running service, this must be revisited.

Thread safety note for Phase 4: Rich progress bars and GitPython may need thread-safe wrappers. A spike/prototype is recommended before committing to ThreadPoolExecutor. Alternative approach: asyncio + subprocess for git operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    acceptedDirection approved, safe to start workenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions