perf(compile): cache file walk and fix placement for narrow patterns#871
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves apm compile performance and correctness by caching filesystem traversal results and adjusting low-distribution instruction placement to use lowest-common-ancestor coverage, reducing repeated os.walk/iterdir() work and avoiding incorrect root placements for narrow applyTo patterns.
Changes:
- Switch primitive discovery
find_primitive_files()fromglob.glob(recursive=True)toos.walkwith early directory pruning and shared skip dirs. - Add
DEFAULT_SKIP_DIRSconstant and use it to prune traversal in both primitives discovery and compilation analysis. - Update
ContextOptimizerto cache per-directory file lists and use them for glob matching, directory matching, and stats computation; adjust low-distribution placement to use minimal-coverage (LCA) placement.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/primitives/test_discovery_walk.py | Adds unit tests for the new walk-based discovery helpers and verifies ContextOptimizer’s cached glob behavior. |
| src/apm_cli/primitives/discovery.py | Implements _glob_match and updates find_primitive_files() to walk/prune instead of glob.glob(). |
| src/apm_cli/constants.py | Introduces DEFAULT_SKIP_DIRS to centralize unconditional traversal skips. |
| src/apm_cli/compilation/context_optimizer.py | Adds _directory_files_cache and rewires _cached_glob / matching / stats; changes single-point placement to use LCA. |
| CHANGELOG.md | Documents the behavior/perf changes under Unreleased. |
| Strategy: Place at the lowest common ancestor of all matching directories. | ||
| This is the most specific directory that still provides full hierarchical | ||
| coverage, avoiding pollution of unrelated subtrees. | ||
| """ | ||
| candidates = self._generate_all_candidates(matching_directories, instruction) | ||
| # Find the deepest directory that covers all matches | ||
| minimal_coverage = self._find_minimal_coverage_placement(matching_directories) | ||
| if minimal_coverage and minimal_coverage in self._directory_cache: | ||
| return [minimal_coverage] |
There was a problem hiding this comment.
_optimize_single_point_placement() now uses lowest-common-ancestor placement, which is the crux of the bugfix described in the PR. There are existing tests for single-point placement and for LCA-at-root sibling coverage, but there is no test that asserts a non-root LCA is chosen when multiple matching directories exist under a deep subtree (the regression you mention like Engine/Plugins/PCG*/**/*). Add a unit test that constructs such a tree and asserts the placement directory is that subtree LCA, not the project root.
| ### Changed | ||
|
|
||
| - `find_primitive_files()` now uses `os.walk` with early directory pruning instead of `glob.glob(recursive=True)`, so `compilation.exclude` patterns prevent traversal into expensive subtrees (e.g. large game-engine asset trees). Adds shared `DEFAULT_SKIP_DIRS` constant used by both discovery and compilation. | ||
| - `ContextOptimizer` now builds a `_directory_files_cache` during `_analyze_project_structure()` and uses it for `_cached_glob`, `_find_matching_directories`, and `_directory_matches_pattern`, eliminating repeated `os.walk` / `iterdir()` calls. Directory pruning is moved before the depth check so excluded subtrees are never descended into. Stats loop rewritten from O(N^2) per-directory inheritance walk to O(N) with pre-computed pattern sets. Low-distribution placement now uses `_find_minimal_coverage_placement` directly instead of a pollution-scored candidate search that incorrectly biased toward the project root. |
There was a problem hiding this comment.
The new changelog bullets under ## [Unreleased] do not follow the repo format rules: each entry should end with a PR reference like (#<number>), and be one logical change per line. Please add the appropriate PR number(s) to these entries (and consider splitting the long ContextOptimizer bullet if needed).
| # Match files against the requested patterns | ||
| for file_name in files: | ||
| file_path = current / file_name | ||
| rel_str = str(file_path.relative_to(base_path)).replace(os.sep, '/') | ||
| for pattern in patterns: | ||
| if _glob_match(rel_str, pattern): | ||
| all_files.append(file_path) | ||
| break |
There was a problem hiding this comment.
find_primitive_files() iterates files from os.walk() without sorting, which can make primitive discovery order nondeterministic across filesystems. Because local primitives with the same name are conflict-resolved by first-seen order, this can lead to different winners on different machines. Sort files (and ideally return a consistently sorted valid_files) to preserve deterministic behavior similar to the old glob.glob() path ordering.
|
|
||
| dir_name = os.path.basename(dir_path) | ||
| return dir_name in skip_patterns No newline at end of file | ||
| return dir_name in DEFAULT_SKIP_DIRS No newline at end of file |
There was a problem hiding this comment.
_should_skip_directory() is now unused (no references in the repo) and duplicates the DEFAULT_SKIP_DIRS check already done inline during traversal. Consider removing it to avoid dead code, or reintroduce a call site if it is meant to be the canonical skip check.
| # These never contain APM primitives or user source files and can be | ||
| # very large (e.g. node_modules, .git objects). Used by discovery, | ||
| # compilation, and content hashing to avoid expensive walks. | ||
| # NOTE: .apm is intentionally absent -- it is where primitives live. |
There was a problem hiding this comment.
The comment for DEFAULT_SKIP_DIRS says these directories "never contain APM primitives", but apm_modules/ can contain dependency primitives (it is just intentionally excluded from local traversal). Reword the comment to reflect that these dirs are skipped because they are not relevant for project-source analysis / local primitive discovery. Also consider updating the docs default-exclusions list (e.g. reference/cli-commands.md) since venv/ is now an unconditional skip.
| # These never contain APM primitives or user source files and can be | |
| # very large (e.g. node_modules, .git objects). Used by discovery, | |
| # compilation, and content hashing to avoid expensive walks. | |
| # NOTE: .apm is intentionally absent -- it is where primitives live. | |
| # These directories are not relevant to project-source analysis or local | |
| # primitive discovery and may be very large (e.g. node_modules, .git | |
| # objects). Used by discovery, compilation, and content hashing to avoid | |
| # expensive walks. | |
| # NOTE: .apm is intentionally absent because local project primitives may | |
| # live there and should still be discovered. |
|
@microsoft-github-policy-service agree |
99cd6a6 to
99bc8b0
Compare
| @@ -1,6 +1,6 @@ | |||
| """Context Optimizer for APM distributed compilation system. | |||
| """Context Optimizer for APM distributed compilation system. | |||
There was a problem hiding this comment.
File contains a UTF-8 BOM / non-ASCII character at the start of the module (before the opening triple-quote). Repo policy requires printable ASCII only; this can also cause subtle tooling diffs and Windows encoding issues. Remove the BOM so the first character is a plain ASCII '"'.
| # Fast path: build the glob match set for each expanded pattern, | ||
| # then derive directory→count from matched file parents. | ||
| dir_match_counts: Dict[Path, int] = {} |
There was a problem hiding this comment.
Non-ASCII Unicode arrow character ('→') appears in a source comment. The project requires printable ASCII-only output/source to avoid Windows cp1252 failures. Replace it with an ASCII sequence like '->'.
| dir_prefix = directory | ||
| direct_children = [ | ||
| child for child in directory.iterdir() | ||
| if child.is_dir() and child in self._directory_cache | ||
| child_dir for child_dir in self._directory_cache | ||
| if child_dir.parent == dir_prefix and child_dir != dir_prefix | ||
| ] |
There was a problem hiding this comment.
The new direct-children lookup iterates over all entries in self._directory_cache each time _calculate_inheritance_pollution runs. Since this method is called for each placement candidate, this can reintroduce O(N^2) behavior on large repos (trading syscalls for Python-level full scans). Consider building a children_by_parent: Dict[Path, List[Path]] adjacency map once during _analyze_project_structure() and reading direct_children from that map here.
99bc8b0 to
1debce8
Compare
| pattern_dir_sets: Dict[str, set] = {} | ||
| for instruction in all_instructions: | ||
| pattern = instruction.apply_to | ||
| if pattern and pattern not in pattern_dir_sets: | ||
| pattern_dir_sets[pattern] = self._pattern_cache.get(pattern, set()) |
There was a problem hiding this comment.
pattern_dir_sets is annotated as Dict[str, set], which loses the element type and makes the later membership check (directory in ...) harder to reason about. Consider tightening this to dict[str, set[Path]] (or Dict[str, Set[Path]]) to match what _pattern_cache actually stores.
|
|
||
| - HYBRID-skill review pipeline: `apm-review-panel` now produces a single CEO-synthesized verdict per run (no per-persona spam), with Hybrid E auth-expert routing and `python-architect`'s mandatory three-artifact contract. PRs get one high-signal comment. (#882, #905, #907, #908) | ||
| - Faster primitive discovery on large repos: `compilation.exclude` patterns now prune traversal at the directory level instead of post-filtering. (#870) | ||
| - `ContextOptimizer` reuses a cached per-directory file list built during project analysis for glob matching, directory matching, and stats, eliminating repeated `os.walk` / `iterdir()` calls and rewriting the stats loop from O(N^2) to O(N). Low-distribution `applyTo` patterns are now placed at their lowest common ancestor instead of the project root. (#871) |
There was a problem hiding this comment.
This changelog entry was added under the already-versioned ## [0.9.3] - 2026-04-26 section. Per Keep a Changelog conventions used in this repo, new PR entries should go under ## [Unreleased] until the release is cut. Please move this bullet to the Unreleased section (unless you are intentionally editing historical release notes as part of a release PR).
| # Fallback: walk up from minimal_coverage until we find a cached directory | ||
| if minimal_coverage: | ||
| current = minimal_coverage | ||
| while current != self.base_dir: | ||
| if current in self._directory_cache: | ||
| return [current] | ||
| current = current.parent |
There was a problem hiding this comment.
In the fallback loop that walks upward from minimal_coverage, the condition is only while current != self.base_dir:. If self.base_dir is not in current's ancestor chain (e.g., due to path resolution differences) and current reaches the filesystem root (current.parent == current), this will never terminate. Add a root/parent-stability guard (and/or compare on resolved paths) to guarantee termination.
| # Fallback: walk up from minimal_coverage until we find a cached directory | |
| if minimal_coverage: | |
| current = minimal_coverage | |
| while current != self.base_dir: | |
| if current in self._directory_cache: | |
| return [current] | |
| current = current.parent | |
| # Fallback: walk up from minimal_coverage until we find a cached directory. | |
| # Compare resolved paths to avoid mismatches caused by different path | |
| # representations, and stop if we reach a filesystem root where parent | |
| # traversal no longer makes progress. | |
| if minimal_coverage: | |
| current = minimal_coverage | |
| resolved_base_dir = self.base_dir.resolve() | |
| while current.resolve() != resolved_base_dir: | |
| if current in self._directory_cache: | |
| return [current] | |
| parent = current.parent | |
| if parent == current: | |
| break | |
| current = parent |
1debce8 to
96c8364
Compare
| """Second call with same pattern returns cached result.""" | ||
| (self.base / "a.py").touch() | ||
| optimizer = ContextOptimizer(base_dir=str(self.base)) | ||
| first = optimizer._cached_glob("**/*.py") | ||
| second = optimizer._cached_glob("**/*.py") | ||
| self.assertIs(first, second) | ||
|
|
There was a problem hiding this comment.
This test asserts caching via object identity (assertIs(first, second)), which is brittle: the implementation could legitimately return an equivalent (but different) list instance while still being correctly cached. Prefer asserting on equality and/or verifying the second call does not rebuild (e.g., by inspecting _glob_cache or using a spy/mock) rather than requiring the same list object.
| """Second call with same pattern returns cached result.""" | |
| (self.base / "a.py").touch() | |
| optimizer = ContextOptimizer(base_dir=str(self.base)) | |
| first = optimizer._cached_glob("**/*.py") | |
| second = optimizer._cached_glob("**/*.py") | |
| self.assertIs(first, second) | |
| """Second call with same pattern reuses cached glob data.""" | |
| (self.base / "a.py").touch() | |
| optimizer = ContextOptimizer(base_dir=str(self.base)) | |
| first = optimizer._cached_glob("**/*.py") | |
| second = optimizer._cached_glob("**/*.py") | |
| self.assertEqual(first, second) | |
| self.assertIn("**/*.py", optimizer._glob_cache) | |
| self.assertEqual(first, optimizer._glob_cache["**/*.py"]) |
| # Analyze files in this directory and cache file paths | ||
| dir_files = [] | ||
| for file in files: | ||
| if file.startswith('.'): | ||
| continue | ||
|
|
||
| file_path = current_path / file | ||
| dir_files.append(file_path) | ||
|
|
||
| if dir_files: | ||
| self._directory_files_cache[current_path] = dir_files | ||
|
|
There was a problem hiding this comment.
_analyze_project_structure() applies exclude_patterns only at the directory level. Files that match a file-level exclude pattern (e.g. "/*.dll" or "/*.generated.h") will still be added to dir_files/_directory_files_cache, which means _cached_glob() and downstream matching/placement can still "see" excluded files. Filter file_path with _should_exclude_path(file_path) (or should_exclude(file_path, ...)) before adding it to dir_files, and add/adjust a unit test covering file-level excludes.
| @@ -14,9 +14,9 @@ | |||
| from pathlib import Path | |||
| from typing import Any, Dict, List, Optional, Set, Tuple | |||
| from functools import lru_cache | |||
There was a problem hiding this comment.
lru_cache is imported but not used anywhere in this module. Please remove the unused import to avoid confusion and keep the file clean.
| from functools import lru_cache |
96c8364 to
1ae612c
Compare
Build a _directory_files_cache during _analyze_project_structure() so that _cached_glob(), _find_matching_directories(), and _directory_matches_pattern() all work from the same in-memory file list instead of issuing repeated os.walk / iterdir() calls against disk. Key changes in context_optimizer.py: - _analyze_project_structure: populate _directory_files_cache[dir] for later use; dirs[:] pruning runs BEFORE depth/exclusion checks so os.walk never descends into excluded subtrees - _cached_glob: replaces glob.glob(cwd=base_dir) with a scan of _directory_files_cache using _glob_match() from discovery.py - _find_matching_directories: fast path for ** patterns derives directory hits from the cached glob set (no iterdir()); slow path for non-recursive patterns iterates cached files - _calculate_optimization_stats: rewrite O(N^2) efficiency loop to O(N) using pre-computed pattern_dir_sets from _pattern_cache - _optimize_low_distribution_placement: go straight to _find_minimal_coverage_placement (lowest common ancestor) instead of the pollution-scored candidate search that biased toward root; fixes instruction files for narrow applyTo globs landing at ./ when all matching files live under a specific subtree - Drop local DEFAULT_EXCLUDED_DIRNAMES; use DEFAULT_SKIP_DIRS from constants (introduced in perf/discovery-prune) Tests: TestCachedGlobUsesFileList (4 tests) and TestSinglePointPlacementNonRootLCA (regression for narrow applyTo globs) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1ae612c to
db1da2f
Compare
The PR was rebased onto main, which already contains the cache implementation under a different attribute name (_directory_cache, not _directory_files_cache). Strip the four tests that reference the absent attribute / file-level exclude behaviour, keep the two regression tests that exercise behaviour actually present in main: _glob_cache reuse and non-root LCA placement for narrow applyTo patterns. Update CHANGELOG entry to reflect the regression-coverage scope (microsoft#871). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APM Review Panel Verdict: REJECT
Required before merge (1 item)
Nits (5 items, skip if you want)
CEO arbitrationThe panel reached consensus on a single required finding, raised exclusively by the python-architect: TestSinglePointPlacementNonRootLCA does not exercise the code path its name and docstring claim. With approximately 6 total dirs-with-files and only 2 matching, the base_ratio lands near 0.33, which routes through The remaining five panelists raised no required findings. The nits from supply-chain-security and oss-growth-hacker are stylistically sound and worth addressing on the next revision, but neither rises to a blocker. The auth-expert correctly recused. There is no inter-panelist disagreement to arbitrate on the facts; the single required finding stands uncontested. Strategically, this PR is the right kind of investment: regression tests for ContextOptimizer protect a core differentiator against silent regressions as the project scales contributors. The community trust cost of merging a mislabeled test that provides false confidence outweighs the cost of one more revision cycle. Once the fixture or the class name is corrected, this should merge promptly. Growth/positioning note: The oss-growth-hacker flags that "cached glob layer" and "lowest-common-ancestor placement" are opaque to external contributors. When the author revises for the required fix, encourage a CHANGELOG reframe along the lines of: "Regression tests now guard ContextOptimizer caching and import-placement correctness (#871)" -- this turns an internal implementation note into a contributor-legible signal that the project takes correctness seriously. Per-persona findings (full)Python ArchitectclassDiagram
direction LR
class ContextOptimizer {
<<Facade>>
+base_dir Path
+_glob_cache dict
+_directory_cache dict
+_file_list_cache list
+optimize_instruction_placement(instructions) dict
+_cached_glob(pattern) list
+_find_optimal_placements(instruction) list
+_solve_placement_optimization(instruction) list
+_optimize_single_point_placement(dirs, instruction) list
+_optimize_selective_placement(dirs, instruction) list
+_optimize_distributed_placement(dirs, instruction) list
+_find_minimal_coverage_placement(dirs) Path
+_calculate_distribution_score(dirs) float
}
note for ContextOptimizer "Strategy dispatch on distribution_score:\n<0.3 = SINGLE_POINT\n0.3-0.7 = SELECTIVE_MULTI\n>0.7 = DISTRIBUTED"
class PlacementCandidate {
<<ValueObject>>
+instruction Instruction
+directory Path
+total_score float
}
class DirectoryAnalysis {
<<ValueObject>>
+directory Path
+depth int
+total_files int
+pattern_matches dict
}
class Instruction {
<<ValueObject>>
+name str
+file_path Path
+apply_to str
+content str
}
class TestCachedGlobUsesFileList {
<<TestFixture>>
+test_cached_glob_caches_results()
}
class TestSinglePointPlacementNonRootLCA {
<<TestFixture>>
+test_lca_placement_is_non_root_when_matches_share_deep_subtree()
}
ContextOptimizer *-- DirectoryAnalysis : _directory_cache
ContextOptimizer *-- PlacementCandidate : generates
ContextOptimizer ..> Instruction : reads apply_to
TestCachedGlobUsesFileList ..> ContextOptimizer : exercises _cached_glob
TestSinglePointPlacementNonRootLCA ..> ContextOptimizer : exercises optimize_instruction_placement
TestSinglePointPlacementNonRootLCA ..> Instruction : constructs
class TestCachedGlobUsesFileList:::touched
class TestSinglePointPlacementNonRootLCA:::touched
classDef touched fill:#fff3b0,stroke:#d47600
flowchart TD
A(["test: optimize_instruction_placement\n[instruction with apply_to='Engine/Plugins/PCG*/**/*']"])
B["ContextOptimizer._analyze_project_structure\n[FS] os.walk base_dir -> _directory_cache"]
C["_solve_placement_optimization(instruction)"]
D["[FS] _cached_glob('Engine/Plugins/PCG*/**/*')\nos.chdir(base_dir) + glob.glob -> _glob_cache"]
E["_find_matching_directories(pattern)\nfilters _directory_cache by glob match"]
F{"_calculate_distribution_score\nmatching_dirs / total_dirs_with_files"}
G["ratio < 0.3 -> SINGLE_POINT\n_optimize_single_point_placement\n[NOT reached by current fixture]"]
H["ratio 0.3-0.7 -> SELECTIVE_MULTI\n_optimize_selective_placement\n[ACTUAL path taken: ratio ~0.33]"]
I["ratio > 0.7 -> DISTRIBUTED\nreturns base_dir"]
J["_find_minimal_coverage_placement(matching_dirs)\ncomputes LCA via common path prefix"]
K["LCA = base_dir/Engine/Plugins\nreturns Engine/Plugins"]
L["placement_map[Engine/Plugins] = [instruction]"]
M(["test assertion:\nplacement_dir.resolve().relative_to(base) == 'Engine/Plugins'"])
A --> B
B --> C
C --> D
D --> E
E --> F
F -- "ratio ~0.33 SELECTIVE" --> H
F -- "ratio < 0.3 SINGLE_POINT" --> G
F -- "ratio > 0.7 DISTRIBUTED" --> I
H --> J
G -.-> J
I --> L
J --> K
K --> L
L --> M
style G fill:#ffd6d6,stroke:#c00
style H fill:#d6f5d6,stroke:#060
Design patterns
Required: see above (1 item). Nits: see above (3 items). CLI Logging ExpertNo findings. This PR adds only tests and a changelog entry; no CLI output, CommandLogger, or console path is affected. DevX UX ExpertNo findings. No CLI command surface, help text, error wording, or user-facing flow is changed. Supply Chain Security ExpertNo required findings. Nit: Auth ExpertInactive -- PR only adds regression tests for ContextOptimizer compilation logic and a changelog entry; no auth, token, credential, or host-classification files are touched. OSS Growth HackerNo required findings. Nit: CHANGELOG entry uses internal implementation jargon ("cached glob layer", "lowest-common-ancestor placement") that is opaque to external contributors; a contributor-facing reframe would strengthen the quality signal. Verdict computed deterministically: 1 required finding across 5 active panelists. APPROVE iff N == 0. Push a new commit to clear this verdict label automatically. Note 🔒 Integrity filter blocked 2 itemsThe following items were blocked because they don't meet the GitHub integrity level.
To allow these resources, lower tools:
github:
min-integrity: approved # merged | approved | unapproved | none
|
…rden cache spy The previous test class TestSinglePointPlacementNonRootLCA never reached _optimize_single_point_placement: with matching=2 and 6 dirs-with-files the distribution ratio was ~0.33, which routes through the SELECTIVE_MULTI tier (0.3-0.7). Rename it to honestly reflect what it exercises, and add a separate fixture (matching=2, total=8, ratio=0.25) that lands in the SINGLE_POINT tier (<0.3). Both classes now patch the relevant placement method as a side-effect spy to fail loudly if dispatch ever moves to a different tier. Also folds the panel's nits: - _cached_glob test now patches glob.glob with wraps and asserts call_count == 1 to encode the no-rescan guarantee. - Switch from unittest+tempfile.mkdtemp to pytest+tmp_path so cleanup is automatic and temp files stay out of /tmp. - Move CHANGELOG entry from ### Changed to ### Added and rephrase for external readers (no internal jargon). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APM Review Panel Verdict: APPROVE
Required before merge (0 items)None. Nits (6 items, skip if you want)
CEO arbitrationPR #871 is a clean, targeted addition of regression tests covering two previously unguarded behaviors in ContextOptimizer: LCA-based instruction placement and glob result caching. All five active panelists returned zero required findings, and the inactive auth-expert correctly recused. The panel is unanimous that this PR is safe to merge as-is. The python-architect raised five nits, all stylistic or coverage-extending in nature. The most actionable are: moving spy assertions inside the patch context manager for defensive hygiene, extracting a shared fixture helper to reduce structural duplication between the two LCA test classes, and adding a negative cache test to fully close the contract on Strategically, this PR encodes a correctness guarantee that matters deeply to platform engineers evaluating AI-native tooling: narrow Growth/positioning note: The correctness guarantee encoded here -- narrow Per-persona findings (full)Python ArchitectclassDiagram
class ContextOptimizer {
+str base_dir
-dict _glob_cache
-dict _glob_set_cache
-list _file_list_cache
-dict _directory_cache
-dict _pattern_cache
+optimize_instruction_placement(instructions) dict
+_cached_glob(pattern) list
-_optimize_single_point_placement(matching_dirs, instruction) list
-_optimize_selective_placement(matching_dirs, instruction) list
-_optimize_distributed_placement(matching_dirs, instruction) list
-_find_minimal_coverage_placement(matching_dirs) Path
-_calculate_distribution_score(matching_dirs) float
}
class DirectoryAnalysis {
+Path directory
+int file_count
+float match_ratio
}
class PlacementCandidate {
+Path directory
+float coverage_efficiency
+float pollution_score
+float total_score
}
class Instruction {
+str name
+Path file_path
+str apply_to
+str content
+str description
}
class TestCachedGlobUsesFileList {
+test_cached_glob_caches_results(tmp_path)
}
class TestSelectivePlacementNonRootLCA {
+test_lca_placement_is_non_root_for_selective_distribution(tmp_path)
}
class TestSinglePointPlacementNonRootLCA {
+test_lca_placement_is_non_root_for_low_distribution(tmp_path)
}
ContextOptimizer --> DirectoryAnalysis : analyses
ContextOptimizer --> PlacementCandidate : scores
ContextOptimizer ..> Instruction : optimizes placement for
TestCachedGlobUsesFileList ..> ContextOptimizer : exercises _cached_glob
TestSelectivePlacementNonRootLCA ..> ContextOptimizer : exercises _optimize_selective_placement
TestSinglePointPlacementNonRootLCA ..> ContextOptimizer : exercises _optimize_single_point_placement
class ContextOptimizer:::touched
class TestCachedGlobUsesFileList:::touched
class TestSelectivePlacementNonRootLCA:::touched
class TestSinglePointPlacementNonRootLCA:::touched
classDef touched fill:#fff3b0,stroke:#d47600
note for ContextOptimizer "Strategy pattern: distribution score routes to one of three placement strategies"
note for ContextOptimizer "Cache-aside: _glob_cache memoises glob.glob calls per pattern"
flowchart TD
A([pytest collects tests]) --> B[test_cached_glob_caches_results]
B --> B1["tmp_path/a.py created [FS]"] --> B2["ContextOptimizer.__init__(base_dir=tmp_path)"]
B2 --> B3["patch glob.glob with wraps=glob_module.glob"]
B3 --> B4["optimizer._cached_glob('**/*.py') - first call"]
B4 --> B5{pattern in _glob_cache?}
B5 -- No --> B6["chdir(base_dir) [FS]"] --> B7["glob.glob('**/*.py', recursive=True) [FS]"] --> B8["_glob_cache['**/*.py'] = result"]
B8 --> B10["optimizer._cached_glob('**/*.py') - second call"]
B10 --> B5b{pattern in _glob_cache?}
B5b -- Yes --> B11[return cached list]
B11 --> B12[assert first == second AND glob_spy.call_count == 1]
A --> C[test_lca_placement_is_non_root_for_selective_distribution]
C --> C1["_touch: Engine/Plugins/PCG*/Source/* [FS]"] --> C2["ContextOptimizer.__init__(base_dir=tmp_path)"]
C2 --> C3["Instruction(apply_to='Engine/Plugins/PCG*/**/*')"]
C3 --> C4["patch.object _optimize_selective_placement autospec+side_effect"]
C4 --> C5["optimizer.optimize_instruction_placement([instruction])"]
C5 --> C6["_cached_glob expand apply_to [FS]"] --> C7["_calculate_distribution_score: ratio~0.33"]
C7 --> C8{0.3 <= score <= 0.7?}
C8 -- Yes: SELECTIVE_MULTI --> C9["_optimize_selective_placement (spy fires)"]
C9 --> C10["_find_minimal_coverage_placement: LCA walk"]
C10 --> C11["returns Engine/Plugins"]
C11 --> C12["assert placement != tmp_path AND rel == 'Engine/Plugins'"]
A --> D[test_lca_placement_is_non_root_for_low_distribution]
D --> D1["_touch: 6 sibling dirs + PCG leaves [FS]"] --> D2["ContextOptimizer.__init__(base_dir=tmp_path)"]
D2 --> D3["Instruction(apply_to='Engine/Plugins/PCG*/**/*')"]
D3 --> D4["patch.object _optimize_single_point_placement autospec+side_effect"]
D4 --> D5["optimizer.optimize_instruction_placement([instruction])"]
D5 --> D6["_cached_glob expand apply_to [FS]"] --> D7["_calculate_distribution_score: ratio=0.25"]
D7 --> D8{score < 0.3?}
D8 -- Yes: SINGLE_POINT --> D9["_optimize_single_point_placement (spy fires)"]
D9 --> D10["_find_minimal_coverage_placement: LCA walk"]
D10 --> D11["returns Engine/Plugins"]
D11 --> D12["assert placement != tmp_path AND rel == 'Engine/Plugins'"]
Design patterns
Required: none. Nits: see aggregated section above. CLI Logging ExpertNo findings. DevX UX ExpertNo findings. Supply Chain Security ExpertNo findings. Auth ExpertInactive -- PR #871 only adds regression tests for ContextOptimizer placement logic and a CHANGELOG entry; no auth files (auth.py, token_manager.py, azure_cli.py, github_downloader.py, marketplace/client.py, github_host.py, install/validation.py, registry_proxy.py) are touched. OSS Growth HackerNits: CHANGELOG entry leads with "Regression tests" rather than the user-facing guarantee; consider reframing to lead with behavior ("Instructions now pin to the deepest relevant directory, not the project root") and then note regression test backing. Side-channel: the subtree-locality correctness guarantee and the file-walk cache are both latent growth stories worth surfacing in future release notes or docs. Verdict computed deterministically: 0 required findings across 5 active panelists. APPROVE iff N == 0. Push a new commit to clear this verdict label automatically. Note 🔒 Integrity filter blocked 2 itemsThe following items were blocked because they don't meet the GitHub integrity level.
To allow these resources, lower tools:
github:
min-integrity: approved # merged | approved | unapproved | none
|
Depends on #870.
Build a single _directory_files_cache during project analysis and use it for all subsequent glob matching, eliminating repeated os.walk/iterdir() calls. Also fixes instruction files with narrow applyTo globs (e.g. Engine/Plugins/PCG*/**/*) incorrectly landing at ./ instead of their target subtree -- _optimize_low_distribution_placement now uses _find_minimal_coverage_placement (lowest common ancestor) instead of a pollution-scored search that biased toward root. Stats loop rewritten from O(N^2) to O(N).