fix(manifest): tolerate EPERM/EACCES during corpus walk#12
Merged
Esity merged 1 commit intoLegionIO:mainfrom Apr 27, 2026
Merged
Conversation
Find.find had no internal rescue. A single unreadable subdir (common on macOS for TCC-protected paths like ~/Library/Accounts, Mail, Safari) crashed the entire scan and cascaded to the knowledge status HTTP endpoint as a 500. Replaced with a recursive walker that rescues per-dir. Unreadable subdirs are pruned with a debug log; scan continues with siblings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Esity
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Manifest.scanpreviously used Ruby'sFind.findwith no per-entry rescue. On macOS, any path walk that descends into a TCC-protected directory (e.g.~/Library/Accounts,~/Library/Mail,~/Library/Safari) raisesErrno::EPERMfrom the enclosingdir_initializecall and aborts the entire traversal. BecauseManifest.scanis invoked downstream ofRunners::Ingest.scan_corpusandPOST /api/knowledge/status, that single unreadable subdir surfaces to the client as a HTTP 500.This PR replaces
Find.findwith a small recursive walker built onDir.children, with a per-directory rescue for the well-known unreadable-tree errnos. Unreadable subdirs are pruned at debug level; sibling paths continue to be scanned. The publicscan(path:, extensions:)signature and return shape ({ path:, size:, mtime:, sha256: }) are preserved.Repro
On macOS with the daemon running:
The same call against any path that does not contain a TCC-protected subtree (e.g.
/tmp) succeeds. Note that this bug composes with theDir.pwddefault fix being filed separately againstLegionIO/legionio(/api/knowledge/statuspreviously inherited the daemon's cwd as the default path). The walker fix here is independently valuable: even with the path default fixed upstream, callers can still pass$HOMEor any other path containing a TCC-protected directory explicitly, and the scan should not crash on a single unreadable subdir.The relevant stack trace from a live daemon log on 2026-04-24:
Root cause
Find.finddoes not rescue per-entry. Whendir_initializeraises while opening a subdirectory, the exception propagates out of the entireFind.findblock and the calling method, terminating the scan. There is no way to instructFindto skip a specific erroring subdirectory and continue with siblings.Fix
Before (
lib/legion/extensions/knowledge/helpers/manifest.rb):After:
Design choices
Find.findremoved entirely.require 'find'is no longer needed — manifest.rb was the only consumer in the gem.Dir.childreninstead ofDir.entries.Dir.childrenalready excludes the.and..pseudo-entries, so there's no infinite-recursion guard needed and no per-entry filter for those.Rescue catches
Errno::EPERM, EACCES, ELOOP, ENOENT. EPERM/EACCES handle the macOS TCC case and standard POSIX permission denials; ELOOP handles symlink cycles; ENOENT handles the race where a file disappears between directory listing andFile.size/Digest::SHA256.file(common on macOS's ephemeral caches under~/Library/Caches).log.debug, notlog.warn. TCC-protected directories are expected to be unreadable on macOS for any process without Full Disk Access — emitting a warn-level entry per skipped path would generate a lot of noise on every scan. Debug is the correct level for "this is fine; move on."Local
logprivate_class_method returningLegion::Logging. This matches the existing pattern in sibling files in this gem — concretely,lib/legion/extensions/knowledge/runners/ingest.rb:12-15defines the same shape:Reusing the pattern keeps the helper-vs-runner module conventions consistent across the gem.
Public signature preserved.
scan(path:, extensions:)and the entry shape{ path:, size:, mtime:, sha256: }are unchanged. No breaking changes forRunners::Ingest.scan_corpus,Runners::Corpus.corpus_stats, or any caller viaLegion::Apollo.Per-entry rescue placement. The rescue lives on
walk, which means anEPERMon one subdirectory only prunes that subtree. The recursion that produced sibling subtrees is unaffected.Operational note on log level
log.debugis used intentionally (notwarn) because TCC-protected directoriesare expected and non-actionable on macOS — every scan that touches a home
directory will skip several. Operators running with
LOG_LEVEL=DEBUGinproduction should be aware that this method will emit one debug entry per
unreadable path containing the path string itself. If your environment forwards
debug-level entries to an aggregation backend and the path strings are
sensitive, either:
LOG_LEVELatINFOor higher in production (the default), or[manifest] skipping unreadablein your logpipeline.
Standard log aggregation pipelines filter debug entries by default, so this is
informational rather than a behavior change.
Tests
Added to
spec/legion/extensions/knowledge/helpers/manifest_spec.rb:treats extension filter as case-insensitive—.MD,.TxTregression guard.skips dot-directories and does not recurse into them— pruning regression guard.skips unreadable directories and continues scanning siblings—Errno::EPERMfrom one sibling, asserts the other is still scanned.skips unreadable directories raising Errno::EACCES— same shape, different errno.skips multiple unreadable subdirs at different depths without failing—EPERMat depth 1 andEACCESat depth 2.skips files that disappear between listing and read (ENOENT)— stubsFile.sizeto raiseErrno::ENOENT; asserts the scan returns the surviving sibling.does not crash when the scan root itself is unreadable— defensive guard for the case where the top-level path itself raises.All tests use
Dir.mktmpdirfor real on-disk paths plusallow(...).to receive(...).and_raise(...)for the errno injection, so they don't depend on any host filesystem layout.Result:
Version
0.6.7→0.6.9(skipping0.6.8).CHANGELOG entry added under
[0.6.9]→Fixed::Live validation
The same patch has been running on the local Cellar copy (
/opt/homebrew/Cellar/legionio/1.9.0-1/libexec/lib/ruby/gems/3.4.0/gems/lex-knowledge-0.6.7/lib/legion/extensions/knowledge/helpers/manifest.rb, mtime2026-04-24 15:12:36) for roughly 75 minutes prior to this PR being filed. The daemon log at/opt/homebrew/var/log/legion/legion.logshows:Errno::EPERM @ dir_initialize - /Users/<you>/Library/Accountsstack trace is at2026-04-24 14:56:25— i.e. before the patch was applied.2026-04-24 15:12despite continued daemon activity (thousands of log lines in the window between 15:00 and 16:00).POST /api/knowledge/statusnow returns a normal scan result for paths containing TCC-protected subtrees, where it previously returned 500.Related
LegionIO/legioniohas a separate PR being filed againstlib/legion/api/knowledge.rbto remove the|| Dir.pwddefault on the/api/knowledge/statusroute. Both fixes target the same user-visible 500 but are independent: the API-level fix addresses "the daemon should not silently inherit its cwd"; this PR addresses "even when an explicit path is passed, a single unreadable subdir should not abort the whole scan." Either fix alone narrows the failure surface; together they close it.Checklist
bundle exec rspec) — 18/18 inspec/legion/extensions/knowledge/helpers/manifest_spec.rb; 197/197 full suitebundle exec rubocop) — 37 files inspected, no offenses[0.6.9]→Fixed:entry shown in Version section)LOG_LEVEL=DEBUG(defaultINFOfilters it). Operational note in Design choices above documents this for production deployments.