Context
Follow-up from PR #365 review (EPAM Phase D recommendation). Addresses I/O-heavy checks at monorepo scale (200+ deps, 1,500+ deployed files).
Problem
Three CI checks are disk I/O bound at scale:
| Check |
Current complexity |
At scale (1500 files) |
Bottleneck |
content-integrity |
O(F × C) |
1,500 file reads + char scans |
Disk I/O |
deployed-files-present |
O(F_total) |
1,500 stat() calls |
Disk I/O |
unmanaged-files |
O(G) |
500 stat + path ops |
Disk I/O |
Proposed Optimizations
1. Incremental content scanning
Only scan deployed files whose mtime is newer than lockfile's generated_at timestamp. For PR-triggered CI, this typically limits the scan to 2-5 changed files instead of 1,500.
Estimated latency reduction: ~90% for typical CI runs.
2. Batched stat() calls in _check_deployed_files_present
Walk unique parent directories once via os.scandir() to build a set of existing files. Replace 1,500 individual stat() calls with ~20 scandir() calls + O(1) set lookups.
Estimated improvement: ~10x on network-mounted CI filesystems.
Acceptance Criteria
Context
Follow-up from PR #365 review (EPAM Phase D recommendation). Addresses I/O-heavy checks at monorepo scale (200+ deps, 1,500+ deployed files).
Problem
Three CI checks are disk I/O bound at scale:
content-integritydeployed-files-presentstat()callsunmanaged-filesProposed Optimizations
1. Incremental content scanning
Only scan deployed files whose
mtimeis newer than lockfile'sgenerated_attimestamp. For PR-triggered CI, this typically limits the scan to 2-5 changed files instead of 1,500.Estimated latency reduction: ~90% for typical CI runs.
2. Batched
stat()calls in_check_deployed_files_presentWalk unique parent directories once via
os.scandir()to build a set of existing files. Replace 1,500 individualstat()calls with ~20scandir()calls + O(1) set lookups.Estimated improvement: ~10x on network-mounted CI filesystems.
Acceptance Criteria
generated_atas baselineos.scandir()batch approach for file presence checksgenerated_at