perf(compile): cache file walk and fix placement for narrow patterns by Roozi489 · Pull Request #871 · microsoft/apm

Roozi489 · 2026-04-23T09:31:33Z

Depends on #870.

Build a single _directory_files_cache during project analysis and use it for all subsequent glob matching, eliminating repeated os.walk/iterdir() calls. Also fixes instruction files with narrow applyTo globs (e.g. Engine/Plugins/PCG*/**/*) incorrectly landing at ./ instead of their target subtree -- _optimize_low_distribution_placement now uses _find_minimal_coverage_placement (lowest common ancestor) instead of a pollution-scored search that biased toward root. Stats loop rewritten from O(N^2) to O(N).

Copilot

Pull request overview

This PR improves apm compile performance and correctness by caching filesystem traversal results and adjusting low-distribution instruction placement to use lowest-common-ancestor coverage, reducing repeated os.walk/iterdir() work and avoiding incorrect root placements for narrow applyTo patterns.

Changes:

Switch primitive discovery find_primitive_files() from glob.glob(recursive=True) to os.walk with early directory pruning and shared skip dirs.
Add DEFAULT_SKIP_DIRS constant and use it to prune traversal in both primitives discovery and compilation analysis.
Update ContextOptimizer to cache per-directory file lists and use them for glob matching, directory matching, and stats computation; adjust low-distribution placement to use minimal-coverage (LCA) placement.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/unit/primitives/test_discovery_walk.py	Adds unit tests for the new walk-based discovery helpers and verifies ContextOptimizer’s cached glob behavior.
src/apm_cli/primitives/discovery.py	Implements `_glob_match` and updates `find_primitive_files()` to walk/prune instead of `glob.glob()`.
src/apm_cli/constants.py	Introduces `DEFAULT_SKIP_DIRS` to centralize unconditional traversal skips.
src/apm_cli/compilation/context_optimizer.py	Adds `_directory_files_cache` and rewires `_cached_glob` / matching / stats; changes single-point placement to use LCA.
CHANGELOG.md	Documents the behavior/perf changes under Unreleased.

Copilot · 2026-04-23T09:41:15Z

+        Strategy: Place at the lowest common ancestor of all matching directories.
+        This is the most specific directory that still provides full hierarchical
+        coverage, avoiding pollution of unrelated subtrees.
        """
-        candidates = self._generate_all_candidates(matching_directories, instruction)
+        # Find the deepest directory that covers all matches
+        minimal_coverage = self._find_minimal_coverage_placement(matching_directories)
+        if minimal_coverage and minimal_coverage in self._directory_cache:
+            return [minimal_coverage]


_optimize_single_point_placement() now uses lowest-common-ancestor placement, which is the crux of the bugfix described in the PR. There are existing tests for single-point placement and for LCA-at-root sibling coverage, but there is no test that asserts a non-root LCA is chosen when multiple matching directories exist under a deep subtree (the regression you mention like Engine/Plugins/PCG*/**/*). Add a unit test that constructs such a tree and asserts the placement directory is that subtree LCA, not the project root.

Copilot · 2026-04-23T09:41:15Z

+### Changed
+
+- `find_primitive_files()` now uses `os.walk` with early directory pruning instead of `glob.glob(recursive=True)`, so `compilation.exclude` patterns prevent traversal into expensive subtrees (e.g. large game-engine asset trees). Adds shared `DEFAULT_SKIP_DIRS` constant used by both discovery and compilation.
+- `ContextOptimizer` now builds a `_directory_files_cache` during `_analyze_project_structure()` and uses it for `_cached_glob`, `_find_matching_directories`, and `_directory_matches_pattern`, eliminating repeated `os.walk` / `iterdir()` calls. Directory pruning is moved before the depth check so excluded subtrees are never descended into. Stats loop rewritten from O(N^2) per-directory inheritance walk to O(N) with pre-computed pattern sets. Low-distribution placement now uses `_find_minimal_coverage_placement` directly instead of a pollution-scored candidate search that incorrectly biased toward the project root.


The new changelog bullets under ## [Unreleased] do not follow the repo format rules: each entry should end with a PR reference like (#<number>), and be one logical change per line. Please add the appropriate PR number(s) to these entries (and consider splitting the long ContextOptimizer bullet if needed).

Copilot · 2026-04-23T09:41:15Z

+        # Match files against the requested patterns
+        for file_name in files:
+            file_path = current / file_name
+            rel_str = str(file_path.relative_to(base_path)).replace(os.sep, '/')
+            for pattern in patterns:
+                if _glob_match(rel_str, pattern):
+                    all_files.append(file_path)
+                    break


find_primitive_files() iterates files from os.walk() without sorting, which can make primitive discovery order nondeterministic across filesystems. Because local primitives with the same name are conflict-resolved by first-seen order, this can lead to different winners on different machines. Sort files (and ideally return a consistently sorted valid_files) to preserve deterministic behavior similar to the old glob.glob() path ordering.

Copilot · 2026-04-23T09:41:15Z

-
    dir_name = os.path.basename(dir_path)
-    return dir_name in skip_patterns
+    return dir_name in DEFAULT_SKIP_DIRS


_should_skip_directory() is now unused (no references in the repo) and duplicates the DEFAULT_SKIP_DIRS check already done inline during traversal. Consider removing it to avoid dead code, or reintroduce a call site if it is meant to be the canonical skip check.

Copilot · 2026-04-23T09:41:16Z

+# These never contain APM primitives or user source files and can be
+# very large (e.g. node_modules, .git objects).  Used by discovery,
+# compilation, and content hashing to avoid expensive walks.
+# NOTE: .apm is intentionally absent -- it is where primitives live.


The comment for DEFAULT_SKIP_DIRS says these directories "never contain APM primitives", but apm_modules/ can contain dependency primitives (it is just intentionally excluded from local traversal). Reword the comment to reflect that these dirs are skipped because they are not relevant for project-source analysis / local primitive discovery. Also consider updating the docs default-exclusions list (e.g. reference/cli-commands.md) since venv/ is now an unconditional skip.

Suggested change

# These never contain APM primitives or user source files and can be

# very large (e.g. node_modules, .git objects). Used by discovery,

# compilation, and content hashing to avoid expensive walks.

# NOTE: .apm is intentionally absent -- it is where primitives live.

# These directories are not relevant to project-source analysis or local

# primitive discovery and may be very large (e.g. node_modules, .git

# objects). Used by discovery, compilation, and content hashing to avoid

# expensive walks.

# NOTE: .apm is intentionally absent because local project primitives may

# live there and should still be discovered.

Roozi489 · 2026-04-23T10:35:29Z

@microsoft-github-policy-service agree

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-26T14:10:50Z

@@ -1,6 +1,6 @@
-"""Context Optimizer for APM distributed compilation system.
+"""Context Optimizer for APM distributed compilation system.


File contains a UTF-8 BOM / non-ASCII character at the start of the module (before the opening triple-quote). Repo policy requires printable ASCII only; this can also cause subtle tooling diffs and Windows encoding issues. Remove the BOM so the first character is a plain ASCII '"'.

Copilot · 2026-04-26T14:10:51Z

+            # Fast path: build the glob match set for each expanded pattern,
+            # then derive directory→count from matched file parents.
+            dir_match_counts: Dict[Path, int] = {}


Non-ASCII Unicode arrow character ('→') appears in a source comment. The project requires printable ASCII-only output/source to avoid Windows cp1252 failures. Replace it with an ASCII sequence like '->'.

Copilot · 2026-04-26T14:10:51Z

+            dir_prefix = directory
            direct_children = [
-                child for child in directory.iterdir() 
-                if child.is_dir() and child in self._directory_cache
+                child_dir for child_dir in self._directory_cache
+                if child_dir.parent == dir_prefix and child_dir != dir_prefix
            ]


The new direct-children lookup iterates over all entries in self._directory_cache each time _calculate_inheritance_pollution runs. Since this method is called for each placement candidate, this can reintroduce O(N^2) behavior on large repos (trading syscalls for Python-level full scans). Consider building a children_by_parent: Dict[Path, List[Path]] adjacency map once during _analyze_project_structure() and reading direct_children from that map here.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-26T15:25:35Z

+        pattern_dir_sets: Dict[str, set] = {}
+        for instruction in all_instructions:
+            pattern = instruction.apply_to
+            if pattern and pattern not in pattern_dir_sets:
+                pattern_dir_sets[pattern] = self._pattern_cache.get(pattern, set())


pattern_dir_sets is annotated as Dict[str, set], which loses the element type and makes the later membership check (directory in ...) harder to reason about. Consider tightening this to dict[str, set[Path]] (or Dict[str, Set[Path]]) to match what _pattern_cache actually stores.

Copilot · 2026-04-26T15:25:36Z


 - HYBRID-skill review pipeline: `apm-review-panel` now produces a single CEO-synthesized verdict per run (no per-persona spam), with Hybrid E auth-expert routing and `python-architect`'s mandatory three-artifact contract. PRs get one high-signal comment. (#882, #905, #907, #908)
 - Faster primitive discovery on large repos: `compilation.exclude` patterns now prune traversal at the directory level instead of post-filtering. (#870)
+- `ContextOptimizer` reuses a cached per-directory file list built during project analysis for glob matching, directory matching, and stats, eliminating repeated `os.walk` / `iterdir()` calls and rewriting the stats loop from O(N^2) to O(N). Low-distribution `applyTo` patterns are now placed at their lowest common ancestor instead of the project root. (#871)


This changelog entry was added under the already-versioned ## [0.9.3] - 2026-04-26 section. Per Keep a Changelog conventions used in this repo, new PR entries should go under ## [Unreleased] until the release is cut. Please move this bullet to the Unreleased section (unless you are intentionally editing historical release notes as part of a release PR).

Copilot · 2026-04-26T15:25:36Z

+        # Fallback: walk up from minimal_coverage until we find a cached directory
+        if minimal_coverage:
+            current = minimal_coverage
+            while current != self.base_dir:
+                if current in self._directory_cache:
+                    return [current]
+                current = current.parent


In the fallback loop that walks upward from minimal_coverage, the condition is only while current != self.base_dir:. If self.base_dir is not in current's ancestor chain (e.g., due to path resolution differences) and current reaches the filesystem root (current.parent == current), this will never terminate. Add a root/parent-stability guard (and/or compare on resolved paths) to guarantee termination.

Suggested change

# Fallback: walk up from minimal_coverage until we find a cached directory

if minimal_coverage:

current = minimal_coverage

while current != self.base_dir:

if current in self._directory_cache:

return [current]

current = current.parent

# Fallback: walk up from minimal_coverage until we find a cached directory.

# Compare resolved paths to avoid mismatches caused by different path

# representations, and stop if we reach a filesystem root where parent

# traversal no longer makes progress.

if minimal_coverage:

current = minimal_coverage

resolved_base_dir = self.base_dir.resolve()

while current.resolve() != resolved_base_dir:

if current in self._directory_cache:

return [current]

parent = current.parent

if parent == current:

break

current = parent

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-27T07:43:00Z

+        """Second call with same pattern returns cached result."""
+        (self.base / "a.py").touch()
+        optimizer = ContextOptimizer(base_dir=str(self.base))
+        first = optimizer._cached_glob("**/*.py")
+        second = optimizer._cached_glob("**/*.py")
+        self.assertIs(first, second)
+


This test asserts caching via object identity (assertIs(first, second)), which is brittle: the implementation could legitimately return an equivalent (but different) list instance while still being correctly cached. Prefer asserting on equality and/or verifying the second call does not rebuild (e.g., by inspecting _glob_cache or using a spy/mock) rather than requiring the same list object.

Suggested change

"""Second call with same pattern returns cached result."""

(self.base / "a.py").touch()

optimizer = ContextOptimizer(base_dir=str(self.base))

first = optimizer._cached_glob("**/*.py")

second = optimizer._cached_glob("**/*.py")

self.assertIs(first, second)

"""Second call with same pattern reuses cached glob data."""

(self.base / "a.py").touch()

optimizer = ContextOptimizer(base_dir=str(self.base))

first = optimizer._cached_glob("**/*.py")

second = optimizer._cached_glob("**/*.py")

self.assertEqual(first, second)

self.assertIn("**/*.py", optimizer._glob_cache)

self.assertEqual(first, optimizer._glob_cache["**/*.py"])

Copilot · 2026-04-27T07:43:01Z

+            # Analyze files in this directory and cache file paths
+            dir_files = []
+            for file in files:
+                if file.startswith('.'):
+                    continue

+                file_path = current_path / file
+                dir_files.append(file_path)
+
+            if dir_files:
+                self._directory_files_cache[current_path] = dir_files
+


_analyze_project_structure() applies exclude_patterns only at the directory level. Files that match a file-level exclude pattern (e.g. "/*.dll" or "/*.generated.h") will still be added to dir_files/_directory_files_cache, which means _cached_glob() and downstream matching/placement can still "see" excluded files. Filter file_path with _should_exclude_path(file_path) (or should_exclude(file_path, ...)) before adding it to dir_files, and add/adjust a unit test covering file-level excludes.

Copilot · 2026-04-27T07:43:01Z

@@ -14,9 +14,9 @@
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Set, Tuple
 from functools import lru_cache


lru_cache is imported but not used anywhere in this module. Please remove the unused import to avoid confusion and keep the file clean.

Suggested change

from functools import lru_cache

Build a _directory_files_cache during _analyze_project_structure() so that _cached_glob(), _find_matching_directories(), and _directory_matches_pattern() all work from the same in-memory file list instead of issuing repeated os.walk / iterdir() calls against disk. Key changes in context_optimizer.py: - _analyze_project_structure: populate _directory_files_cache[dir] for later use; dirs[:] pruning runs BEFORE depth/exclusion checks so os.walk never descends into excluded subtrees - _cached_glob: replaces glob.glob(cwd=base_dir) with a scan of _directory_files_cache using _glob_match() from discovery.py - _find_matching_directories: fast path for ** patterns derives directory hits from the cached glob set (no iterdir()); slow path for non-recursive patterns iterates cached files - _calculate_optimization_stats: rewrite O(N^2) efficiency loop to O(N) using pre-computed pattern_dir_sets from _pattern_cache - _optimize_low_distribution_placement: go straight to _find_minimal_coverage_placement (lowest common ancestor) instead of the pollution-scored candidate search that biased toward root; fixes instruction files for narrow applyTo globs landing at ./ when all matching files live under a specific subtree - Drop local DEFAULT_EXCLUDED_DIRNAMES; use DEFAULT_SKIP_DIRS from constants (introduced in perf/discovery-prune) Tests: TestCachedGlobUsesFileList (4 tests) and TestSinglePointPlacementNonRootLCA (regression for narrow applyTo globs) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The PR was rebased onto main, which already contains the cache implementation under a different attribute name (_directory_cache, not _directory_files_cache). Strip the four tests that reference the absent attribute / file-level exclude behaviour, keep the two regression tests that exercise behaviour actually present in main: _glob_cache reuse and non-root LCA placement for narrow applyTo patterns. Update CHANGELOG entry to reflect the regression-coverage scope (microsoft#871). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-30T19:39:20Z

APM Review Panel Verdict: REJECT

The panel reached consensus on a single blocking finding: the test class that claims to exercise _optimize_single_point_placement actually routes through _optimize_selective_placement due to a fixture ratio mismatch. One revision cycle closes it.

Required before merge (1 item)

[python-architect] TestSinglePointPlacementNonRootLCA does not exercise _optimize_single_point_placement at tests/unit/compilation/test_context_optimizer_cache_and_placement.py
- Why: With 2 matching dirs (Engine/Plugins/PCG/Source, Engine/Plugins/PCGExtra/Source) out of ~6 total dirs-with-files (Source, Content, Config, Docs + the two PCG leaves), the distribution ratio is ~0.33, which lands in the SELECTIVE_MULTI tier (0.3-0.7) and routes through _optimize_selective_placement -- not through _optimize_single_point_placement (lines 856-897). A bug confined to the candidate-generation loop in _optimize_single_point_placement (e.g. the coverage_candidates filter at lines 872-881) would not be caught by this test. The class docstring and name assert otherwise, creating false confidence.
- Suggested fix: Either reduce the fixture's sibling dir count so the ratio drops below 0.3 (add more sibling directories, keeping matching=2 but total>7), or add a fourth PCG-family leaf so matching=3 with total=5 (ratio 0.6, stays SELECTIVE but broadens LCA coverage). Alternatively rename the class to TestSelectivePlacementNonRootLCA and update its docstring to honestly reflect the SELECTIVE_MULTI code path being exercised, then add a separate fixture for the SINGLE_POINT path.

Nits (5 items, skip if you want)

[python-architect] test_cached_glob_caches_results does not verify the no-rescan guarantee -- if _cached_glob re-ran glob.glob on every call but still stored results, assertions would pass. Consider: with unittest.mock.patch('glob.glob', wraps=glob.glob) as spy: ... self.assertEqual(spy.call_count, 1) at tests/unit/compilation/test_context_optimizer_cache_and_placement.py
[python-architect] Both tearDown methods use a lazy import shutil inside the method body; move to module top and replace with self.addCleanup(shutil.rmtree, self.tmp, ignore_errors=True) in setUp at tests/unit/compilation/test_context_optimizer_cache_and_placement.py
[python-architect] CHANGELOG entry placed in ### Changed alongside BREAKING changes; regression test additions belong under ### Tests or ### Added per Keep a Changelog conventions at CHANGELOG.md
[supply-chain-security-expert] tempfile.mkdtemp() writes to /tmp -- consider pytest's tmp_path fixture or a project-local temp dir for CI hygiene on shared runners at tests/unit/compilation/test_context_optimizer_cache_and_placement.py
[oss-growth-hacker] CHANGELOG entry uses internal implementation jargon ("cached glob layer", "lowest-common-ancestor placement") opaque to external contributors -- consider a reframe like "Regression tests now guard ContextOptimizer caching and import-placement correctness (perf(compile): cache file walk and fix placement for narrow patterns #871)" at CHANGELOG.md

CEO arbitration

The panel reached consensus on a single required finding, raised exclusively by the python-architect: TestSinglePointPlacementNonRootLCA does not exercise the code path its name and docstring claim. With approximately 6 total dirs-with-files and only 2 matching, the base_ratio lands near 0.33, which routes through _optimize_selective_placement rather than _optimize_single_point_placement. This is not a naming pedantry -- a bug confined to the candidate-generation filter in _optimize_single_point_placement (lines 872-881) would survive this test suite entirely. The fix is surgical: either restructure the fixture so the ratio drops below 0.3, or rename the class and docstring to honestly reflect the SELECTIVE_MULTI path being exercised.

The remaining five panelists raised no required findings. The nits from supply-chain-security and oss-growth-hacker are stylistically sound and worth addressing on the next revision, but neither rises to a blocker. The auth-expert correctly recused. There is no inter-panelist disagreement to arbitrate on the facts; the single required finding stands uncontested.

Strategically, this PR is the right kind of investment: regression tests for ContextOptimizer protect a core differentiator against silent regressions as the project scales contributors. The community trust cost of merging a mislabeled test that provides false confidence outweighs the cost of one more revision cycle. Once the fixture or the class name is corrected, this should merge promptly.

Growth/positioning note: The oss-growth-hacker flags that "cached glob layer" and "lowest-common-ancestor placement" are opaque to external contributors. When the author revises for the required fix, encourage a CHANGELOG reframe along the lines of: "Regression tests now guard ContextOptimizer caching and import-placement correctness (#871)" -- this turns an internal implementation note into a contributor-legible signal that the project takes correctness seriously.

Per-persona findings (full)

Python Architect

classDiagram
    direction LR

    class ContextOptimizer {
        <<Facade>>
        +base_dir Path
        +_glob_cache dict
        +_directory_cache dict
        +_file_list_cache list
        +optimize_instruction_placement(instructions) dict
        +_cached_glob(pattern) list
        +_find_optimal_placements(instruction) list
        +_solve_placement_optimization(instruction) list
        +_optimize_single_point_placement(dirs, instruction) list
        +_optimize_selective_placement(dirs, instruction) list
        +_optimize_distributed_placement(dirs, instruction) list
        +_find_minimal_coverage_placement(dirs) Path
        +_calculate_distribution_score(dirs) float
    }
    note for ContextOptimizer "Strategy dispatch on distribution_score:\n<0.3 = SINGLE_POINT\n0.3-0.7 = SELECTIVE_MULTI\n>0.7 = DISTRIBUTED"

    class PlacementCandidate {
        <<ValueObject>>
        +instruction Instruction
        +directory Path
        +total_score float
    }

    class DirectoryAnalysis {
        <<ValueObject>>
        +directory Path
        +depth int
        +total_files int
        +pattern_matches dict
    }

    class Instruction {
        <<ValueObject>>
        +name str
        +file_path Path
        +apply_to str
        +content str
    }

    class TestCachedGlobUsesFileList {
        <<TestFixture>>
        +test_cached_glob_caches_results()
    }

    class TestSinglePointPlacementNonRootLCA {
        <<TestFixture>>
        +test_lca_placement_is_non_root_when_matches_share_deep_subtree()
    }

    ContextOptimizer *-- DirectoryAnalysis : _directory_cache
    ContextOptimizer *-- PlacementCandidate : generates
    ContextOptimizer ..> Instruction : reads apply_to
    TestCachedGlobUsesFileList ..> ContextOptimizer : exercises _cached_glob
    TestSinglePointPlacementNonRootLCA ..> ContextOptimizer : exercises optimize_instruction_placement
    TestSinglePointPlacementNonRootLCA ..> Instruction : constructs

    class TestCachedGlobUsesFileList:::touched
    class TestSinglePointPlacementNonRootLCA:::touched
    classDef touched fill:#fff3b0,stroke:#d47600

flowchart TD
    A(["test: optimize_instruction_placement\n[instruction with apply_to='Engine/Plugins/PCG*/**/*']"])
    B["ContextOptimizer._analyze_project_structure\n[FS] os.walk base_dir -> _directory_cache"]
    C["_solve_placement_optimization(instruction)"]
    D["[FS] _cached_glob('Engine/Plugins/PCG*/**/*')\nos.chdir(base_dir) + glob.glob -> _glob_cache"]
    E["_find_matching_directories(pattern)\nfilters _directory_cache by glob match"]
    F{"_calculate_distribution_score\nmatching_dirs / total_dirs_with_files"}
    G["ratio < 0.3 -> SINGLE_POINT\n_optimize_single_point_placement\n[NOT reached by current fixture]"]
    H["ratio 0.3-0.7 -> SELECTIVE_MULTI\n_optimize_selective_placement\n[ACTUAL path taken: ratio ~0.33]"]
    I["ratio > 0.7 -> DISTRIBUTED\nreturns base_dir"]
    J["_find_minimal_coverage_placement(matching_dirs)\ncomputes LCA via common path prefix"]
    K["LCA = base_dir/Engine/Plugins\nreturns Engine/Plugins"]
    L["placement_map[Engine/Plugins] = [instruction]"]
    M(["test assertion:\nplacement_dir.resolve().relative_to(base) == 'Engine/Plugins'"])

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F -- "ratio ~0.33 SELECTIVE" --> H
    F -- "ratio < 0.3 SINGLE_POINT" --> G
    F -- "ratio > 0.7 DISTRIBUTED" --> I
    H --> J
    G -.-> J
    I --> L
    J --> K
    K --> L
    L --> M

    style G fill:#ffd6d6,stroke:#c00
    style H fill:#d6f5d6,stroke:#060

Design patterns

Used in this PR: none -- the test file is straight-line procedural unittest code; it exercises the Strategy dispatch inside ContextOptimizer but does not introduce a pattern itself.
Pragmatic suggestion: Parameterized test via subTest -- applying it to test_lca_placement_is_non_root_when_matches_share_deep_subtree would let a single test body verify LCA correctness across all three distribution tiers (SINGLE_POINT, SELECTIVE_MULTI, DISTRIBUTED fallback), directly closing the coverage gap flagged in required without tripling the fixture code.

Required: see above (1 item). Nits: see above (3 items).

CLI Logging Expert

No findings. This PR adds only tests and a changelog entry; no CLI output, CommandLogger, or console path is affected.

DevX UX Expert

No findings. No CLI command surface, help text, error wording, or user-facing flow is changed.

Supply Chain Security Expert

No required findings. Nit: tempfile.mkdtemp() in both test classes writes to /tmp; consider pytest's tmp_path fixture for CI hygiene on shared runners.

Auth Expert

Inactive -- PR only adds regression tests for ContextOptimizer compilation logic and a changelog entry; no auth, token, credential, or host-classification files are touched.

OSS Growth Hacker

No required findings. Nit: CHANGELOG entry uses internal implementation jargon ("cached glob layer", "lowest-common-ancestor placement") that is opaque to external contributors; a contributor-facing reframe would strengthen the quality signal.

_{Verdict computed deterministically: 1 required finding across 5 active panelists. APPROVE iff N == 0. Push a new commit to clear this verdict label automatically.}

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

#871 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
perf(compile): cache file walk and fix placement for narrow patterns #871 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by PR Review Panel for issue #871 · ● 1.7M · ◷

…rden cache spy The previous test class TestSinglePointPlacementNonRootLCA never reached _optimize_single_point_placement: with matching=2 and 6 dirs-with-files the distribution ratio was ~0.33, which routes through the SELECTIVE_MULTI tier (0.3-0.7). Rename it to honestly reflect what it exercises, and add a separate fixture (matching=2, total=8, ratio=0.25) that lands in the SINGLE_POINT tier (<0.3). Both classes now patch the relevant placement method as a side-effect spy to fail loudly if dispatch ever moves to a different tier. Also folds the panel's nits: - _cached_glob test now patches glob.glob with wraps and asserts call_count == 1 to encode the no-rescan guarantee. - Switch from unittest+tempfile.mkdtemp to pytest+tmp_path so cleanup is automatic and temp files stay out of /tmp. - Move CHANGELOG entry from ### Changed to ### Added and rephrase for external readers (no internal jargon). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-30T20:30:49Z

APM Review Panel Verdict: APPROVE

All five active specialists agreed: zero required findings. The regression test suite is clean, well-scoped, and safe to merge.

Required before merge (0 items)

None.

Nits (6 items, skip if you want)

[python-architect] test_cached_glob_caches_results: assertions on glob_spy.call_count run after the with patch(...) block exits; move them inside the block for defensive hygiene (spy still active during assertions here, but the pattern is misleading).
[python-architect] The two LCA test classes (TestSelectivePlacementNonRootLCA, TestSinglePointPlacementNonRootLCA) are structurally identical except for number of sibling dirs and spy target; extract a shared helper _build_pcg_fixture(tmp_path, extra_dirs) to make the ratio intent explicit and reduce duplication.
[python-architect] Class docstrings describe the tier but do not state the expected numeric ratio; add e.g. ratio ~ 0.33 inline so the tier boundary is self-documenting without reading the fixture comments.
[python-architect] _touch silently succeeds if the file already exists; consider renaming to _make_file for a more Pythonic name that doesn't import the Unix touch mental model.
[python-architect] No negative test for _cached_glob: a test asserting that two different patterns each invoke glob.glob exactly once would complete the cache-contract coverage.
[oss-growth-hacker] CHANGELOG entry leads with "Regression tests" (an implementation detail) rather than the user-facing guarantee; consider reframing to "Instructions now pin to the deepest relevant directory instead of being hoisted to the project root" and then noting regression test backing.

CEO arbitration

PR #871 is a clean, targeted addition of regression tests covering two previously unguarded behaviors in ContextOptimizer: LCA-based instruction placement and glob result caching. All five active panelists returned zero required findings, and the inactive auth-expert correctly recused. The panel is unanimous that this PR is safe to merge as-is.

The python-architect raised five nits, all stylistic or coverage-extending in nature. The most actionable are: moving spy assertions inside the patch context manager for defensive hygiene, extracting a shared fixture helper to reduce structural duplication between the two LCA test classes, and adding a negative cache test to fully close the contract on _cached_glob. None of these block merge; they are recommended as follow-up housekeeping. The oss-growth-hacker correctly identified that the CHANGELOG framing leads with implementation detail rather than user-facing guarantee -- a minor reframe to "Instructions now pin to the deepest relevant directory" would improve changelog legibility for adopters scanning for behavior changes.

Strategically, this PR encodes a correctness guarantee that matters deeply to platform engineers evaluating AI-native tooling: narrow applyTo patterns respect subtree locality and do not silently hoist to project root. That guarantee, now regression-tested, belongs in product-level communications and the "why APM" narrative, not just in the test suite.

Growth/positioning note: The correctness guarantee encoded here -- narrow applyTo patterns respect subtree locality -- is a strong trust signal for AI-native tooling users for whom context scoping is foundational. If a blog post or docs page covers "how APM decides where to apply instructions," this regression suite is proof of rigor. The file-walk cache is a latent performance story: if benchmark numbers exist, that is concrete and shareable.

Per-persona findings (full)

Python Architect

classDiagram
    class ContextOptimizer {
        +str base_dir
        -dict _glob_cache
        -dict _glob_set_cache
        -list _file_list_cache
        -dict _directory_cache
        -dict _pattern_cache
        +optimize_instruction_placement(instructions) dict
        +_cached_glob(pattern) list
        -_optimize_single_point_placement(matching_dirs, instruction) list
        -_optimize_selective_placement(matching_dirs, instruction) list
        -_optimize_distributed_placement(matching_dirs, instruction) list
        -_find_minimal_coverage_placement(matching_dirs) Path
        -_calculate_distribution_score(matching_dirs) float
    }
    class DirectoryAnalysis {
        +Path directory
        +int file_count
        +float match_ratio
    }
    class PlacementCandidate {
        +Path directory
        +float coverage_efficiency
        +float pollution_score
        +float total_score
    }
    class Instruction {
        +str name
        +Path file_path
        +str apply_to
        +str content
        +str description
    }
    class TestCachedGlobUsesFileList {
        +test_cached_glob_caches_results(tmp_path)
    }
    class TestSelectivePlacementNonRootLCA {
        +test_lca_placement_is_non_root_for_selective_distribution(tmp_path)
    }
    class TestSinglePointPlacementNonRootLCA {
        +test_lca_placement_is_non_root_for_low_distribution(tmp_path)
    }

    ContextOptimizer --> DirectoryAnalysis : analyses
    ContextOptimizer --> PlacementCandidate : scores
    ContextOptimizer ..> Instruction : optimizes placement for
    TestCachedGlobUsesFileList ..> ContextOptimizer : exercises _cached_glob
    TestSelectivePlacementNonRootLCA ..> ContextOptimizer : exercises _optimize_selective_placement
    TestSinglePointPlacementNonRootLCA ..> ContextOptimizer : exercises _optimize_single_point_placement

    class ContextOptimizer:::touched
    class TestCachedGlobUsesFileList:::touched
    class TestSelectivePlacementNonRootLCA:::touched
    class TestSinglePointPlacementNonRootLCA:::touched
    classDef touched fill:#fff3b0,stroke:#d47600

    note for ContextOptimizer "Strategy pattern: distribution score routes to one of three placement strategies"
    note for ContextOptimizer "Cache-aside: _glob_cache memoises glob.glob calls per pattern"

flowchart TD
    A([pytest collects tests]) --> B[test_cached_glob_caches_results]
    B --> B1["tmp_path/a.py created [FS]"] --> B2["ContextOptimizer.__init__(base_dir=tmp_path)"]
    B2 --> B3["patch glob.glob with wraps=glob_module.glob"]
    B3 --> B4["optimizer._cached_glob('**/*.py') - first call"]
    B4 --> B5{pattern in _glob_cache?}
    B5 -- No --> B6["chdir(base_dir) [FS]"] --> B7["glob.glob('**/*.py', recursive=True) [FS]"] --> B8["_glob_cache['**/*.py'] = result"]
    B8 --> B10["optimizer._cached_glob('**/*.py') - second call"]
    B10 --> B5b{pattern in _glob_cache?}
    B5b -- Yes --> B11[return cached list]
    B11 --> B12[assert first == second AND glob_spy.call_count == 1]

    A --> C[test_lca_placement_is_non_root_for_selective_distribution]
    C --> C1["_touch: Engine/Plugins/PCG*/Source/* [FS]"] --> C2["ContextOptimizer.__init__(base_dir=tmp_path)"]
    C2 --> C3["Instruction(apply_to='Engine/Plugins/PCG*/**/*')"]
    C3 --> C4["patch.object _optimize_selective_placement autospec+side_effect"]
    C4 --> C5["optimizer.optimize_instruction_placement([instruction])"]
    C5 --> C6["_cached_glob expand apply_to [FS]"] --> C7["_calculate_distribution_score: ratio~0.33"]
    C7 --> C8{0.3 <= score <= 0.7?}
    C8 -- Yes: SELECTIVE_MULTI --> C9["_optimize_selective_placement (spy fires)"]
    C9 --> C10["_find_minimal_coverage_placement: LCA walk"]
    C10 --> C11["returns Engine/Plugins"]
    C11 --> C12["assert placement != tmp_path AND rel == 'Engine/Plugins'"]

    A --> D[test_lca_placement_is_non_root_for_low_distribution]
    D --> D1["_touch: 6 sibling dirs + PCG leaves [FS]"] --> D2["ContextOptimizer.__init__(base_dir=tmp_path)"]
    D2 --> D3["Instruction(apply_to='Engine/Plugins/PCG*/**/*')"]
    D3 --> D4["patch.object _optimize_single_point_placement autospec+side_effect"]
    D4 --> D5["optimizer.optimize_instruction_placement([instruction])"]
    D5 --> D6["_cached_glob expand apply_to [FS]"] --> D7["_calculate_distribution_score: ratio=0.25"]
    D7 --> D8{score < 0.3?}
    D8 -- Yes: SINGLE_POINT --> D9["_optimize_single_point_placement (spy fires)"]
    D9 --> D10["_find_minimal_coverage_placement: LCA walk"]
    D10 --> D11["returns Engine/Plugins"]
    D11 --> D12["assert placement != tmp_path AND rel == 'Engine/Plugins'"]

Design patterns

Used in this PR: Strategy -- optimize_instruction_placement routes to _optimize_single_point_placement, _optimize_selective_placement, or _optimize_distributed_placement based on distribution score; the tests verify that the correct strategy branch is entered via spy. Cache-aside -- _cached_glob memoises glob.glob results in _glob_cache; the first test pins the invariant of exactly one filesystem scan per unique pattern. Test-spy over autospec -- both LCA tests use patch.object(..., autospec=True, side_effect=original) to observe dispatch without replacing behavior.
Pragmatic suggestion: none -- the current test shape is the simplest correct design at this scope.

Required: none. Nits: see aggregated section above.

CLI Logging Expert

No findings.

DevX UX Expert

No findings.

Supply Chain Security Expert

No findings.

Auth Expert

Inactive -- PR #871 only adds regression tests for ContextOptimizer placement logic and a CHANGELOG entry; no auth files (auth.py, token_manager.py, azure_cli.py, github_downloader.py, marketplace/client.py, github_host.py, install/validation.py, registry_proxy.py) are touched.

OSS Growth Hacker

Nits: CHANGELOG entry leads with "Regression tests" rather than the user-facing guarantee; consider reframing to lead with behavior ("Instructions now pin to the deepest relevant directory, not the project root") and then note regression test backing. Side-channel: the subtree-locality correctness guarantee and the file-walk cache are both latent growth stories worth surfacing in future release notes or docs.

_{Verdict computed deterministically: 0 required findings across 5 active panelists. APPROVE iff N == 0. Push a new commit to clear this verdict label automatically.}

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

#871 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
perf(compile): cache file walk and fix placement for narrow patterns #871 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by PR Review Panel for issue #871 · ● 1.3M · ◷

Copilot AI review requested due to automatic review settings April 23, 2026 09:31

Roozi489 requested review from danielmeppiel and sergio-sisternes-epam as code owners April 23, 2026 09:31

Copilot started reviewing on behalf of Roozi489 April 23, 2026 09:32 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Roozi489 mentioned this pull request Apr 23, 2026

perf(discovery): prune excluded subtrees during traversal #870

Merged

Roozi489 force-pushed the perf/compile-cache-and-placement branch from 99cd6a6 to 99bc8b0 Compare April 26, 2026 13:39

Roozi489 requested a review from Copilot April 26, 2026 14:06

Copilot started reviewing on behalf of Roozi489 April 26, 2026 14:07 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Roozi489 force-pushed the perf/compile-cache-and-placement branch from 99bc8b0 to 1debce8 Compare April 26, 2026 15:02

Roozi489 requested a review from Copilot April 26, 2026 15:19

Copilot started reviewing on behalf of Roozi489 April 26, 2026 15:20 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Roozi489 force-pushed the perf/compile-cache-and-placement branch from 1debce8 to 96c8364 Compare April 26, 2026 16:08

Roozi489 requested a review from Copilot April 27, 2026 07:36

Copilot started reviewing on behalf of Roozi489 April 27, 2026 07:37 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Roozi489 force-pushed the perf/compile-cache-and-placement branch from 96c8364 to 1ae612c Compare April 29, 2026 17:44

danielmeppiel mentioned this pull request Apr 30, 2026

[Summary] Next issues and PRs to solve, prioritized #1064

Closed

danielmeppiel force-pushed the perf/compile-cache-and-placement branch from 1ae612c to db1da2f Compare April 30, 2026 15:43

Copilot and others added 3 commits April 30, 2026 18:15

Merge branch 'main' into perf/compile-cache-and-placement

29a8400

Merge branch 'main' into perf/compile-cache-and-placement

29e52d1

danielmeppiel added the panel-review Trigger the apm-review-panel gh-aw workflow label Apr 30, 2026

github-actions Bot added panel-rejected Apm-review-panel verdict: REJECT. Removed automatically on next push. and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 30, 2026

danielmeppiel added the panel-review Trigger the apm-review-panel gh-aw workflow label Apr 30, 2026

github-actions Bot added panel-approved Apm-review-panel verdict: APPROVE. Removed automatically on next push. and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 30, 2026

danielmeppiel merged commit 43e75b6 into microsoft:main Apr 30, 2026
19 checks passed

danielmeppiel removed the panel-rejected Apm-review-panel verdict: REJECT. Removed automatically on next push. label Apr 30, 2026

This was referenced May 1, 2026

[docs] Update documentation for features from 2026-05-01 #1095

Open

chore(release): cut 0.12.0 #1112

Open

-# These never contain APM primitives or user source files and can be
-# very large (e.g. node_modules, .git objects).  Used by discovery,
-# compilation, and content hashing to avoid expensive walks.
-# NOTE: .apm is intentionally absent -- it is where primitives live.
+# These directories are not relevant to project-source analysis or local
+# primitive discovery and may be very large (e.g. node_modules, .git
+# objects). Used by discovery, compilation, and content hashing to avoid
+# expensive walks.
+# NOTE: .apm is intentionally absent because local project primitives may
+# live there and should still be discovered.

		@@ -1,6 +1,6 @@
		"""Context Optimizer for APM distributed compilation system.
		"""Context Optimizer for APM distributed compilation system.

Conversation

Roozi489 commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Roozi489 commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 30, 2026

APM Review Panel Verdict: REJECT

Required before merge (1 item)

Nits (5 items, skip if you want)

CEO arbitration

Python Architect

CLI Logging Expert

DevX UX Expert

Supply Chain Security Expert

Auth Expert

OSS Growth Hacker

Uh oh!

github-actions Bot commented Apr 30, 2026

APM Review Panel Verdict: APPROVE

Required before merge (0 items)