-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Is your feature request related to a problem? Please describe.
As APM adoption grows and packages develop deeper transitive dependency trees, several algorithmic bottlenecks in the install and compile paths become significant:
| Bottleneck | Location | Current Complexity | Impact |
|---|---|---|---|
| Primitive conflict detection | primitives/models.py _add_with_conflict_detection() |
O(m²) — linear list scan per add | Scales quadratically with primitive count |
| Cycle detection path tracking | deps/apm_resolver.py detect_circular_dependencies() |
O(V×D) — in operator on List[str] |
Degrades with tree depth |
get_nodes_at_depth() during flatten |
deps/dependency_graph.py |
O(V × max_depth) — full node scan per depth level | Repeated scans across 50 depth levels |
from_apm_yml() YAML parsing |
models/apm_package.py |
25 call sites, 0 caching — same files parsed 20-50× per operation | Unnecessary disk I/O and CPU |
get_config() reads |
config.py |
5-10 disk reads per operation, no caching | Redundant file I/O |
read_constitution() |
compilation/constitution.py |
Read 3× per compile from claude_formatter.py and injector.py |
Redundant file I/O |
| Inheritance chain computation | compilation/context_optimizer.py _get_inheritance_chain() |
Recomputed every call, no caching | Redundant O(D) walks per instruction |
| Package downloads | cli.py install loop |
Fully sequential — O(N×T) wall-clock | 10 pkgs × 5s = 50s vs ~10s parallel |
| GitHub API resilience | deps/github_downloader.py |
No 429/rate-limit handling | Fails immediately under throttling |
What works well today: BFS tree building uses deque + Set for O(1) dedup (queued_keys), glob pattern results are cached in ContextOptimizer, flattening uses Dict-based FlatDependencyMap with O(1) lookups, and output assembly uses list.append() + str.join() (not string concatenation). These should be preserved.
Describe the solution you'd like
A phased performance improvement plan with no breaking changes. All optimizations are internal — same inputs produce identical outputs. Lock file format v1 preserved.
Phase 1 — Performance Baseline (Benchmarks & Fixtures)
Entry gate — must complete before optimization work begins.
- Add
pytest-benchmarktest suite intests/benchmarks/covering: dependency resolution (50+ packages, 5 depth levels), compilation (200+ primitives, 100+ directories), primitive conflict detection (500+ primitives),from_apm_yml()parse throughput, inheritance chain computation - Add deep dependency tree synthetic fixtures in
tests/fixtures/: 100 packages, 10 depth levels, diamond dependencies, large primitive counts - Record baseline measurements in
tests/benchmarks/BASELINE.md
Phase 2 — Data Structure Fixes (High Impact, Low Risk)
Can proceed in parallel with Phase 3.
- Conflict detection O(m²) → O(m): Add
Dict[str, int]name→index alongside each primitive list inPrimitiveCollection. Check dict (O(1)) instead of scanningfor i, existing in enumerate(collection). File:src/apm_cli/primitives/models.py - Cycle detection O(V×D) → O(V+E): Add companion
current_path_set: Set[str]indetect_circular_dependencies(). Use set for O(1)inchecks, keep list only for cycle-path reporting. File:src/apm_cli/deps/apm_resolver.py - Depth-index for flatten: Pre-compute
Dict[int, List[DependencyNode]]inDependencyTreeon node insertion.get_nodes_at_depth()becomes O(1). File:src/apm_cli/deps/dependency_graph.py
Phase 3 — Caching Layer (High Impact, Medium Risk)
Can proceed in parallel with Phase 2.
-
from_apm_yml()parse cache: Module-levelDict[Path, APMPackage]keyed by resolved absolute path. No invalidation needed within a CLI invocation. Addclear_apm_yml_cache()for test isolation. File:src/apm_cli/models/apm_package.py -
get_config()cache: Read-once-per-process module-level cache. File:src/apm_cli/config.py -
read_constitution()cache: Cache after first read. File:src/apm_cli/compilation/constitution.py - Inheritance chain cache:
Dict[Path, List[Path]]inContextOptimizer, populated on first call per directory. File:src/apm_cli/compilation/context_optimizer.py
Phase 4 — Network Parallelization (Highest Wall-Clock Impact, Higher Risk)
Depends on Phases 2-3 stability.
- Parallel package downloads:
concurrent.futures.ThreadPoolExecutor(default 4 workers) for the flat install list after dependency resolution. Thread-safe Rich progress bars, collect-all-errors pattern, cap at 4-6 concurrent clones. Files:src/apm_cli/cli.py,src/apm_cli/deps/github_downloader.py - Git sparse-checkout for subdirectory packages: Use
git sparse-checkout(git 2.25+) with full-clone fallback for virtual subdirectory packages. File:src/apm_cli/deps/github_downloader.py - (Stretch) Pre-fetch next BFS depth level in parallel during transitive resolution
Phase 5 — Resilience & Robustness (Medium Impact, Low Risk)
Independent of other phases.
- Rate limit handling: Inspect
X-RateLimit-RemainingandRetry-Afterheaders. Exponential backoff with max 3 retries for 429/503. File:src/apm_cli/deps/github_downloader.py - Skip-if-exists optimization: When lockfile has a commit SHA and local clone
HEADmatches, skip re-download. Files:src/apm_cli/deps/github_downloader.py,src/apm_cli/cli.py
Describe alternatives you've considered
- Async/await rewrite: Full migration to
asynciofor network I/O. Rejected as too invasive for the current stage —ThreadPoolExecutorachieves most of the parallel download benefit with far less refactoring. - Database-backed caching (SQLite): Persistent cross-invocation cache for parsed manifests. Rejected as over-engineering — CLI invocations are short-lived, and simple in-memory caches per process are sufficient.
- CDN-based package hosting: Pre-built package tarballs on a CDN instead of git clones. Deferred — requires infrastructure outside APM's scope and git-based resolution works well for the current package scale.
- npm-style global cache directory: A
~/.apm/cache/shared across all projects. Considered for Phase 5 skip-if-exists but deferred — lockfile-based local dedup handles the common case.
Additional context
Scope boundaries:
- All changes are internal — no CLI flag changes (except potentially
--parallel-downloads Nin Phase 4), no API changes, no lock file format changes. - Backward compatibility is a hard requirement: identical inputs must produce byte-identical outputs.
- Phase 1 benchmarks serve as the regression gate for all subsequent phases.
Dependency graph for phases:
Phase 1 (baseline) ──┬──> Phase 2 (data structures) ──┬──> Phase 4 (parallelization)
└──> Phase 3 (caching) ──────┘
Phase 5 (resilience) [independent]
Key files involved:
| File | Concern |
|---|---|
src/apm_cli/primitives/models.py |
O(m²) conflict detection |
src/apm_cli/deps/apm_resolver.py |
Cycle detection, BFS builder, flatten |
src/apm_cli/deps/dependency_graph.py |
get_nodes_at_depth(), tree structures |
src/apm_cli/models/apm_package.py |
Uncached from_apm_yml() (25 call sites) |
src/apm_cli/config.py |
Uncached get_config() |
src/apm_cli/compilation/constitution.py |
Uncached read_constitution() |
src/apm_cli/compilation/context_optimizer.py |
Uncached inheritance chains |
src/apm_cli/cli.py |
Sequential install loop |
src/apm_cli/deps/github_downloader.py |
No parallelism, no rate limits |
Design constraint — cache invalidation: The "no invalidation within a process" strategy is intentional. APM is a short-lived CLI tool; files don't change mid-operation. If APM ever becomes a long-running service, this must be revisited.
Thread safety note for Phase 4: Rich progress bars and GitPython may need thread-safe wrappers. A spike/prototype is recommended before committing to ThreadPoolExecutor. Alternative approach: asyncio + subprocess for git operations.