feat: filesystem connector — emits observations on file changes#163
feat: filesystem connector — emits observations on file changes#163
Conversation
First of the data-source connectors tracked in #62. Ships as @agentmemory/fs-watcher under integrations/filesystem-watcher/. Scope (v0.1.0): - Watches one or more roots via node:fs.watch({recursive: true}). No native deps; works on macOS / Linux / Windows 10+. - Emits file_change / file_delete observations to POST /agentmemory/observe on the running server. - Debounces 500ms per path so a stream of editor saves collapses to a single observation. - Reads the first 4 KB of text files for content preview; truncation is flagged in metadata. Binary files are skipped by default (AGENTMEMORY_FS_WATCH_ALLOW_BINARY=1 to opt in). - Default ignore set: .git, node_modules, dist, build, .next, .turbo, coverage, .DS_Store, *.log, *.lock. Extensible via AGENTMEMORY_FS_WATCH_IGNORE. - Attaches Bearer AGENTMEMORY_SECRET when set. - CLI and env-driven config; CLI args win. Can be supervised with launchd / systemd / pm2. Tests (test/fs-watcher.test.ts): - file_change on write, file_delete on unlink. - Default ignore set skips node_modules/. - Bearer auth header attached when secret is configured. - Debounce collapses four rapid writes to <=2 observations. - configFromEnv parses dirs and regex ignore patterns. GitHub and Slack connectors (from #62) will be separate PRs since they need OAuth / token handling and a shared event-normalizer; the file-watcher has no such dependency and is useful standalone.
📝 WalkthroughWalkthroughAdds a new filesystem-watcher integration: docs, CLI entrypoint, npm package manifest, a FilesystemWatcher implementation that emits file_change/file_delete observations to an agentmemory server, and tests covering behavior and configuration. Changes
Sequence Diagram(s)sequenceDiagram
participant FS as Filesystem
participant Watcher as FilesystemWatcher
participant Timer as DebounceTimer
participant Server as AgentMemoryServer
rect rgba(100,200,100,0.5)
note over FS,Server: File change detection and emission flow
FS->>Watcher: fs.watch event (filename)
Watcher->>Watcher: normalize path & apply ignore rules
Watcher->>Timer: schedule 500ms debounce for path
Timer->>Watcher: debounce elapsed for path
Watcher->>FS: statSync(path) -> exists?
alt exists and is file
Watcher->>FS: read preview (UTF-8 up to 4096 bytes)
Watcher->>Watcher: build payload (file_change, metadata, truncated flag)
else deleted or missing
Watcher->>Watcher: build payload (file_delete, metadata)
end
Watcher->>Server: POST /agentmemory/observe (JSON + optional Authorization)
Server-->>Watcher: HTTP response (2xx / non-OK logged)
end
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integrations/filesystem-watcher/package.json`:
- Line 14: The package currently declares "engines": { "node": ">=18" } but uses
fs.watch(..., { recursive: true }) which is unavailable on Linux for Node 18;
either raise the Node engine floor in package.json to ">=20" (or ">=19.1.0") to
guarantee recursive watch support, or implement a runtime fallback in the
watcher code where fs.watch is called: detect Node/version and platform
(process.version / process.platform) and, if Node < 19.1.0 on Linux, replace the
recursive fs.watch usage with a non-recursive per-directory watcher that walks
the target tree and creates individual fs.watch watchers for each directory (and
updates them on directory add/remove); update package.json "engines" if you
choose the version bump.
In `@integrations/filesystem-watcher/watcher.mjs`:
- Around line 121-135: The payload currently built before calling this.emit
contains files/content/metadata at top level and may send null project/sessionId
and omit cwd/timestamp; change the payload shape to match the API's HookPayload
by ensuring non-empty hookType, sessionId and project (validate/throw on startup
if they are missing), add cwd (use rootDir) and timestamp (ISO string of now),
and move observation details into a data object (e.g. data: { files: [relPath],
content: this.formatContent(...), metadata: { source: "filesystem-watcher",
rootDir, absPath, size, truncated } }); then call this.emit with that corrected
payload so this.emit and this.formatContent are passed the API-required shape.
- Around line 149-171: The loop that calls watch(root, ...) logs failures per
root but allows startup to proceed even if no watchers were created; update the
code after the for (const root of this.roots) loop to check if
this.watchers.length === 0 and fail startup by logging a clear error via
this.logger.error and throwing an Error (or otherwise rejecting initialization)
so the process doesn't appear healthy when no fs.watch handles were obtained;
refer to this.roots, watch(...), this.watchers, this.logger, schedule, and
isIgnored to locate the watch setup and place the guard immediately after it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 70312a72-7f3d-4834-9e77-238b0998cfb3
📒 Files selected for processing (5)
integrations/filesystem-watcher/README.mdintegrations/filesystem-watcher/bin.mjsintegrations/filesystem-watcher/package.jsonintegrations/filesystem-watcher/watcher.mjstest/fs-watcher.test.ts
Three issues flagged by review. 1. CRITICAL - observe endpoint rejects current payload shape. The /agentmemory/observe handler requires a full HookPayload (non-empty hookType from the fixed HookType union, sessionId, project, cwd, timestamp) with observation data under 'data'. The watcher was emitting hookType='file_change'/'file_delete' (not valid HookType values), possibly-null sessionId and project, no cwd or timestamp, and files/content/metadata at the top level. The server would 400 every observation. Now emits hookType='post_tool_use' with changeKind moved into data.changeKind. sessionId auto-generates as fs-watcher-<ts>-<rand> when not supplied; project defaults to the first watched directory's basename. cwd is set to the root that triggered the event, timestamp to new Date().toISOString(). files/content/rootDir/absPath/size/truncated/source all sit under data. 2. MAJOR - start() returned successfully even when every fs.watch call failed. In CLI/supervisor usage that looked like a healthy start while no observations ever fired. Now start() throws if this.watchers ends up empty, with the node-version hint in the message since Node 18 + Linux is the likely cause. 3. MAJOR - engines.node was >=18, but fs.watch(recursive: true) on Linux only landed in Node 19.1.0. Raised to >=20 (next LTS after 19.x) and documented the requirement in README. test/fs-watcher.test.ts: - Updated two shape assertions (hookType, sessionId, project, cwd, timestamp at top level; files/content/changeKind under data). - Added test that start() throws when no roots can be watched.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
integrations/filesystem-watcher/watcher.mjs (1)
108-147: Minor: preferfsp.statoverstatSyncin the debounced flush.
flushis alreadyasyncand awaitsreadPreview; usingstatSynchere blocks the event loop for no reason and is inconsistent with the rest of the I/O in the class. Not a correctness issue, just tidier:- try { - const st = statSync(absPath); - if (!st.isFile()) return; - size = st.size; - } catch { - exists = false; - } + try { + const st = await fsp.stat(absPath); + if (!st.isFile()) return; + size = st.size; + } catch { + exists = false; + }(You can then drop
statSyncfrom thenode:fsimport on Line 1.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integrations/filesystem-watcher/watcher.mjs` around lines 108 - 147, The flush method uses the synchronous statSync which blocks the event loop; replace it with the asynchronous fsp.stat (or fs.promises.stat) inside async flush in watcher.mjs: call await fsp.stat(absPath) (or use try { const st = await stat(absPath) } if imported as stat from fs/promises), keep the same checks (st.isFile(), st.size) and preserve the exists/size/truncated logic and subsequent calls to isTextFile, readPreview, formatContent and emit; also remove statSync from the top-level imports after switching to the promise-based stat.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integrations/filesystem-watcher/watcher.mjs`:
- Around line 213-217: The code that builds extraIgnore from
env.AGENTMEMORY_FS_WATCH_IGNORE can throw on invalid patterns and crash startup;
update the logic in configFromEnv (the extraIgnore construction) to safely parse
each CSV entry by wrapping new RegExp(s) in a try/catch, skipping entries that
throw and emitting a warning (e.g., via console.warn or the module logger) that
includes the offending pattern, and ensure extraIgnore still becomes an array of
valid RegExp objects; keep the trimming/filtering behavior and preserve
empty/default handling.
- Around line 5-27: The watcher currently treats `.env` files as text to preview
because ".env" is included in TEXT_EXTENSIONS and there is no ignore pattern for
dotenv files in DEFAULT_IGNORE; update DEFAULT_IGNORE to include common dotenv
patterns (e.g., /\.env(?:\.[^\/]+)?$/ or variants matching .env, .env.local,
.env.production) and remove ".env" from TEXT_EXTENSIONS (or add a runtime
exclusion for files matching the dotenv patterns before creating previews) so
these secrets are not read or POSTed by the observe flow; ensure changes
reference TEXT_EXTENSIONS and DEFAULT_IGNORE so reviewers can locate the
modifications and consider documenting the new default in the README with
guidance about AGENTMEMORY_FS_WATCH_IGNORE for opt-in behavior.
---
Nitpick comments:
In `@integrations/filesystem-watcher/watcher.mjs`:
- Around line 108-147: The flush method uses the synchronous statSync which
blocks the event loop; replace it with the asynchronous fsp.stat (or
fs.promises.stat) inside async flush in watcher.mjs: call await
fsp.stat(absPath) (or use try { const st = await stat(absPath) } if imported as
stat from fs/promises), keep the same checks (st.isFile(), st.size) and preserve
the exists/size/truncated logic and subsequent calls to isTextFile, readPreview,
formatContent and emit; also remove statSync from the top-level imports after
switching to the promise-based stat.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d74ac6a6-6c23-4e31-b860-905c2c89daee
📒 Files selected for processing (4)
integrations/filesystem-watcher/README.mdintegrations/filesystem-watcher/package.jsonintegrations/filesystem-watcher/watcher.mjstest/fs-watcher.test.ts
✅ Files skipped from review due to trivial changes (1)
- integrations/filesystem-watcher/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- integrations/filesystem-watcher/package.json
- test/fs-watcher.test.ts
| const TEXT_EXTENSIONS = new Set([ | ||
| ".ts", ".tsx", ".js", ".jsx", ".mjs", ".cjs", | ||
| ".py", ".rb", ".go", ".rs", ".java", ".kt", ".swift", | ||
| ".c", ".cc", ".cpp", ".h", ".hpp", | ||
| ".md", ".mdx", ".txt", ".rst", | ||
| ".json", ".yaml", ".yml", ".toml", ".ini", ".env", | ||
| ".html", ".css", ".scss", ".vue", ".svelte", | ||
| ".sh", ".bash", ".zsh", ".fish", | ||
| ".sql", ".graphql", ".proto", | ||
| ]); | ||
|
|
||
| const DEFAULT_IGNORE = [ | ||
| /(?:^|\/)\.git(?:\/|$)/, | ||
| /(?:^|\/)node_modules(?:\/|$)/, | ||
| /(?:^|\/)dist(?:\/|$)/, | ||
| /(?:^|\/)build(?:\/|$)/, | ||
| /(?:^|\/)\.next(?:\/|$)/, | ||
| /(?:^|\/)\.turbo(?:\/|$)/, | ||
| /(?:^|\/)coverage(?:\/|$)/, | ||
| /(?:^|\/)\.DS_Store$/, | ||
| /\.log$/, | ||
| /\.lock$/, | ||
| ]; |
There was a problem hiding this comment.
.env files will be read and POSTed — secrets leak risk.
.env is in TEXT_EXTENSIONS (Line 10) but is not in DEFAULT_IGNORE. As a result, saving .env, .env.local, .env.production, etc. causes the watcher to read the first 4 KB and send it as data.content to /agentmemory/observe. For most users these files hold DB passwords, API keys, OAuth secrets — exactly the things that shouldn't leave the machine, and they are typically untouched by .gitignore-style lists at the watcher level.
Two reasonable fixes (either is fine; both is safest):
🛡️ Add dotenv files to default ignore and drop them from text previews
const TEXT_EXTENSIONS = new Set([
".ts", ".tsx", ".js", ".jsx", ".mjs", ".cjs",
".py", ".rb", ".go", ".rs", ".java", ".kt", ".swift",
".c", ".cc", ".cpp", ".h", ".hpp",
".md", ".mdx", ".txt", ".rst",
- ".json", ".yaml", ".yml", ".toml", ".ini", ".env",
+ ".json", ".yaml", ".yml", ".toml", ".ini",
".html", ".css", ".scss", ".vue", ".svelte",
".sh", ".bash", ".zsh", ".fish",
".sql", ".graphql", ".proto",
]);
const DEFAULT_IGNORE = [
/(?:^|\/)\.git(?:\/|$)/,
/(?:^|\/)node_modules(?:\/|$)/,
/(?:^|\/)dist(?:\/|$)/,
/(?:^|\/)build(?:\/|$)/,
/(?:^|\/)\.next(?:\/|$)/,
/(?:^|\/)\.turbo(?:\/|$)/,
/(?:^|\/)coverage(?:\/|$)/,
/(?:^|\/)\.DS_Store$/,
+ /(?:^|\/)\.env(?:\..+)?$/,
+ /(?:^|\/)(?:id_rsa|id_ed25519|id_ecdsa|id_dsa)(?:\.pub)?$/,
+ /\.pem$/,
+ /\.key$/,
/\.log$/,
/\.lock$/,
];Worth calling out in the README so users who want .env observations can opt back in via AGENTMEMORY_FS_WATCH_IGNORE overrides (or a future allow-list).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const TEXT_EXTENSIONS = new Set([ | |
| ".ts", ".tsx", ".js", ".jsx", ".mjs", ".cjs", | |
| ".py", ".rb", ".go", ".rs", ".java", ".kt", ".swift", | |
| ".c", ".cc", ".cpp", ".h", ".hpp", | |
| ".md", ".mdx", ".txt", ".rst", | |
| ".json", ".yaml", ".yml", ".toml", ".ini", ".env", | |
| ".html", ".css", ".scss", ".vue", ".svelte", | |
| ".sh", ".bash", ".zsh", ".fish", | |
| ".sql", ".graphql", ".proto", | |
| ]); | |
| const DEFAULT_IGNORE = [ | |
| /(?:^|\/)\.git(?:\/|$)/, | |
| /(?:^|\/)node_modules(?:\/|$)/, | |
| /(?:^|\/)dist(?:\/|$)/, | |
| /(?:^|\/)build(?:\/|$)/, | |
| /(?:^|\/)\.next(?:\/|$)/, | |
| /(?:^|\/)\.turbo(?:\/|$)/, | |
| /(?:^|\/)coverage(?:\/|$)/, | |
| /(?:^|\/)\.DS_Store$/, | |
| /\.log$/, | |
| /\.lock$/, | |
| ]; | |
| const TEXT_EXTENSIONS = new Set([ | |
| ".ts", ".tsx", ".js", ".jsx", ".mjs", ".cjs", | |
| ".py", ".rb", ".go", ".rs", ".java", ".kt", ".swift", | |
| ".c", ".cc", ".cpp", ".h", ".hpp", | |
| ".md", ".mdx", ".txt", ".rst", | |
| ".json", ".yaml", ".yml", ".toml", ".ini", | |
| ".html", ".css", ".scss", ".vue", ".svelte", | |
| ".sh", ".bash", ".zsh", ".fish", | |
| ".sql", ".graphql", ".proto", | |
| ]); | |
| const DEFAULT_IGNORE = [ | |
| /(?:^|\/)\.git(?:\/|$)/, | |
| /(?:^|\/)node_modules(?:\/|$)/, | |
| /(?:^|\/)dist(?:\/|$)/, | |
| /(?:^|\/)build(?:\/|$)/, | |
| /(?:^|\/)\.next(?:\/|$)/, | |
| /(?:^|\/)\.turbo(?:\/|$)/, | |
| /(?:^|\/)coverage(?:\/|$)/, | |
| /(?:^|\/)\.DS_Store$/, | |
| /(?:^|\/)\.env(?:\..+)?$/, | |
| /(?:^|\/)(?:id_rsa|id_ed25519|id_ecdsa|id_dsa)(?:\.pub)?$/, | |
| /\.pem$/, | |
| /\.key$/, | |
| /\.log$/, | |
| /\.lock$/, | |
| ]; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integrations/filesystem-watcher/watcher.mjs` around lines 5 - 27, The watcher
currently treats `.env` files as text to preview because ".env" is included in
TEXT_EXTENSIONS and there is no ignore pattern for dotenv files in
DEFAULT_IGNORE; update DEFAULT_IGNORE to include common dotenv patterns (e.g.,
/\.env(?:\.[^\/]+)?$/ or variants matching .env, .env.local, .env.production)
and remove ".env" from TEXT_EXTENSIONS (or add a runtime exclusion for files
matching the dotenv patterns before creating previews) so these secrets are not
read or POSTed by the observe flow; ensure changes reference TEXT_EXTENSIONS and
DEFAULT_IGNORE so reviewers can locate the modifications and consider
documenting the new default in the README with guidance about
AGENTMEMORY_FS_WATCH_IGNORE for opt-in behavior.
| const extraIgnore = (env.AGENTMEMORY_FS_WATCH_IGNORE || "") | ||
| .split(",") | ||
| .map((s) => s.trim()) | ||
| .filter(Boolean) | ||
| .map((s) => new RegExp(s)); |
There was a problem hiding this comment.
Invalid user regex will crash configFromEnv at startup.
new RegExp(s) throws SyntaxError on malformed patterns (e.g. AGENTMEMORY_FS_WATCH_IGNORE="foo,["), which will kill the CLI before start() runs with no actionable log. Wrap and skip/warn so a single bad entry doesn't take down the watcher.
♻️ Tolerant regex parsing
- const extraIgnore = (env.AGENTMEMORY_FS_WATCH_IGNORE || "")
- .split(",")
- .map((s) => s.trim())
- .filter(Boolean)
- .map((s) => new RegExp(s));
+ const extraIgnore = (env.AGENTMEMORY_FS_WATCH_IGNORE || "")
+ .split(",")
+ .map((s) => s.trim())
+ .filter(Boolean)
+ .flatMap((s) => {
+ try {
+ return [new RegExp(s)];
+ } catch (err) {
+ console.warn(
+ `[fs-watcher] ignoring invalid AGENTMEMORY_FS_WATCH_IGNORE pattern ${JSON.stringify(s)}: ${err?.message || err}`,
+ );
+ return [];
+ }
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const extraIgnore = (env.AGENTMEMORY_FS_WATCH_IGNORE || "") | |
| .split(",") | |
| .map((s) => s.trim()) | |
| .filter(Boolean) | |
| .map((s) => new RegExp(s)); | |
| const extraIgnore = (env.AGENTMEMORY_FS_WATCH_IGNORE || "") | |
| .split(",") | |
| .map((s) => s.trim()) | |
| .filter(Boolean) | |
| .flatMap((s) => { | |
| try { | |
| return [new RegExp(s)]; | |
| } catch (err) { | |
| console.warn( | |
| `[fs-watcher] ignoring invalid AGENTMEMORY_FS_WATCH_IGNORE pattern ${JSON.stringify(s)}: ${err?.message || err}`, | |
| ); | |
| return []; | |
| } | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integrations/filesystem-watcher/watcher.mjs` around lines 213 - 217, The code
that builds extraIgnore from env.AGENTMEMORY_FS_WATCH_IGNORE can throw on
invalid patterns and crash startup; update the logic in configFromEnv (the
extraIgnore construction) to safely parse each CSV entry by wrapping new
RegExp(s) in a try/catch, skipping entries that throw and emitting a warning
(e.g., via console.warn or the module logger) that includes the offending
pattern, and ensure extraIgnore still becomes an array of valid RegExp objects;
keep the trimming/filtering behavior and preserve empty/default handling.
Bump version + ship CHANGELOG covering everything that merged since v0.8.13: - #118 security advisory drafts for v0.8.2 CVEs - #132 semantic eviction routing + batched retention audit - #157 iii console docs + vendored screenshots in README - #160 (#158) health gated on RSS floor - #161 (#159) standalone MCP proxies to the running server - #162 (#125) mem::forget audit coverage + policy doc - #163 (#62) @agentmemory/fs-watcher filesystem connector - #164 Next.js website (website/ root, ship to Vercel) Version bumps (8 files): - package.json / package-lock.json (top + packages['']) - plugin/.claude-plugin/plugin.json - packages/mcp/package.json (self + ~0.9.0 dep pin) - src/version.ts (union extended, assigned 0.9.0) - src/types.ts (ExportData.version union) - src/functions/export-import.ts (supportedVersions set) - test/export-import.test.ts (export assertion) Tests: 777 passing. Build clean.
First of the data-source connectors tracked in #62.
What
A standalone Node package under
integrations/filesystem-watcher/(published as@agentmemory/fs-watcher) that watches one or more directories and emits afile_change/file_deleteobservation toPOST /agentmemory/observeevery time a file changes.Scope (v0.1.0)
node:fs.watch({ recursive: true })— no native deps. Works on macOS / Linux / Windows 10+.metadata.truncated. Binary files skipped by default (AGENTMEMORY_FS_WATCH_ALLOW_BINARY=1to opt in)..git,node_modules,dist,build,.next,.turbo,coverage,.DS_Store,*.log,*.lock. Extend withAGENTMEMORY_FS_WATCH_IGNORE=regex1,regex2.Authorization: Bearer ${AGENTMEMORY_SECRET}when set.launchd/systemd/pm2.Tests
test/fs-watcher.test.ts(7 cases):file_changeemitted on write.file_deleteemitted on unlink.node_modules/.configFromEnvparses comma-separated dirs + regex ignore patterns.All 7 pass in ~4s (test uses real FS events against a
mkdtempSyncroot).Follow-ups (not in this PR)
compress/enrichdoesn't have to know where the event came from. Worth extracting once the second connector exists.I'll open separate issues / PRs for those. This one is scoped deliberately narrow so it can land and start being useful.
Summary by CodeRabbit
New Features
Documentation
Tests