feat: Built-in agent — LLM-powered AEO analyst with chat API by arberx · Pull Request #74 · AINYC/canonry

arberx · 2026-03-16T20:25:10Z

Summary

Built-in LLM-powered agent with chat API, thread persistence in SQLite, and 9 domain-specific AEO tools (visibility runs, timeline analysis, evidence lookup, competitor analysis, keyword management, sitemap inspection, etc.)
CLI command (canonry agent) for interactive terminal chat sessions
Multi-provider support — agent threads can use OpenAI, Claude, or Gemini as the backing LLM
Security hardening — project ownership verification on thread endpoints, SSRF protection with DNS resolution on sitemap inspection, project-scoped run access
Bug fixes — history windowing (newest messages kept, not oldest), complete evidence across all runs, malformed JSON recovery for Claude message replay

Commits

feat: built-in agent — LLM-powered AEO analyst with chat API — core agent loop, tools, store, routes, CLI command, DB migrations
fix(security): Add project ownership verification to thread endpoints
fix(agent): Add error handling for malformed JSON in tool call arguments
perf(agent): Move dynamic imports to top-level
refactor(agent): Replace circular HTTP self-calls with direct service layer
style(agent): Remove dead code and unused types
fix(agent): fix 5 bugs in agent loop, SSRF validation, and services — history windowing, SSRF validation, project-scoped getRun, complete getHistory, Claude replay fix

Test plan

CI passes (typecheck, lint, test)
SSRF validation tests cover IPv6 loopback, link-local, ULA, localhost, DNS resolution
Manual: canonry agent interactive session against a project with runs
Manual: verify thread persistence across server restarts

🤖 Generated with Claude Code

packages/api-routes/src/agent.ts

packages/canonry/src/server.ts

packages/canonry/src/agent/loop.ts

packages/canonry/src/agent/llm.ts

packages/canonry/src/agent/services.ts

packages/canonry/src/agent/types.ts

packages/db/src/migrate.ts

packages/canonry/src/agent/loop.ts

packages/canonry/src/sitemap-parser.ts

packages/canonry/src/server.ts

arberx

🤖 Automated Review Summary

Files reviewed: 21
Comments left: 13

Issues found:

🔴 Bug: 5
🟡 Security: 2
🟠 Performance: 1
🔵 Type Safety: 2
🟣 Testing: 1
⚪ Style: 1
⚪ Dead code: 1

Key findings

Bugs (fix before merge):

Tool-call persistence ordering (loop.ts ~line 188) — assistant tool-call rows are stored after tool.execute() runs. If execution throws, the DB ends up with a tool result row but no matching assistant row, corrupting thread replay for Claude.
Empty apiKey fallback (server.ts ~line 499) — apiKey ?? '' means a misconfigured provider silently constructs a working handler that returns 401 on every LLM call instead of returning undefined to disable the agent.
ApiClient with undefined apiUrl/apiKey (server.ts ~line 503) — self-hosted instances without apiUrl set will get silent failures on run_sweep and all GSC tools.
Orphaned messages on thread delete (agent.ts ~line 220) — ON DELETE CASCADE only fires if PRAGMA foreign_keys = ON, which is not guaranteed.
Unbounded message field (agent.ts ~line 158) — no maxLength on the message body; a single large payload can rack up LLM token costs.

Security:

dns.resolve6 catch-all silently swallows errors (sitemap-parser.ts) — not exploitable but could incorrectly block IPv6-only hosts.
Missing message length limit (covered above under Bugs).

Testing gap:

~830 lines of new agent code ship with zero unit tests. The loop's history-windowing, maxSteps fallback, and JSON recovery paths are all critical and unverified.

Performance:

getTimeline has an N+1 query pattern; getHistory already demonstrates the correct bulk-fetch approach.

What's done well ✅

The SSRF hardening in sitemap-parser.ts is thorough: DNS resolution, IPv4/IPv6 loopback, link-local, ULA, and IPv4-mapped IPv6 addresses are all covered, with tests.
History windowing (newest-N-ascending subquery) is the right approach and well-commented.
Provider abstraction is clean — adding a new LLM is a one-function addition in llm.ts.
Malformed JSON recovery in convertToClaudeMessages is a good defensive touch.
DB schema with ON DELETE CASCADE + composite index on (thread_id, created_at) is solid.

Overall assessment: NEEDS_WORK on the tool-call persistence bug and the ApiClient/apiKey issues before this is safe to merge.

This review was generated by an AI agent. Please verify all suggestions.

- Fix tool-call persistence ordering: persist assistant row before tool.execute() so DB is never left with orphaned tool results - Guard against empty apiKey: return undefined instead of silently constructing a broken handler - Fall back to localhost:{port} when apiUrl is not configured so self-hosted instances can use HTTP-backed agent tools - Explicitly delete agent_messages before thread deletion (don't rely on PRAGMA foreign_keys = ON) - Add maxLength: 8000 on message body schema (Fastify/Ajv enforcement) - Fix N+1 in getTimeline: bulk-fetch all snapshots with inArray - Remove dead claude entry from PROVIDER_ENDPOINTS (uses dedicated path) - Clean up duplicate projects import alias in server.ts - Narrow dns.resolve6 catch to ENODATA/ENOTFOUND only - Add CHECK constraint on agent_messages.role column Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a built-in AI agent that uses canonry's own tools to answer AEO questions, run sweeps, and explain citation changes. No external agent framework required — just the LLM provider already configured. Architecture: - Agent loop modeled after OpenClaw's pattern (LLM ↔ tool ↔ repeat) - Uses existing provider API keys from canonry config - Persistence in SQLite (same database, new tables) - Provider priority: Claude > OpenAI > Gemini (configurable) New files: - packages/canonry/src/agent/ — core agent module - loop.ts: LLM ↔ tool execution cycle - llm.ts: provider-agnostic LLM layer (OpenAI, Claude, Gemini) - tools.ts: canonry operations as LLM-callable functions - store.ts: thread/message persistence (SQLite) - prompt.ts: AEO analyst system prompt - types.ts: shared type definitions - packages/api-routes/src/agent.ts — REST API for chat - packages/canonry/src/commands/agent.ts — CLI commands CLI: canonry agent ask <project> "message" — chat with the agent canonry agent threads <project> — list threads canonry agent thread <project> <id> — show thread history API: POST /api/v1/projects/:project/agent/threads — create thread GET /api/v1/projects/:project/agent/threads — list threads GET /api/v1/projects/:project/agent/threads/:id — get thread + messages POST /api/v1/projects/:project/agent/threads/:id/messages — send message DELETE /api/v1/projects/:project/agent/threads/:id — delete thread Config: agent: provider: claude|openai|gemini (optional, auto-detects) model: string (optional, uses provider default) maxSteps: number (default: 10) maxHistory: number (default: 30) enabled: boolean (default: true if provider available) Tools exposed to agent: - get_status, run_sweep, get_evidence, get_timeline - list_keywords, list_competitors, get_run_details - get_gsc_performance, get_gsc_coverage, inspect_url DB migration: - agent_threads: conversation threads per project - agent_messages: messages within threads (user/assistant/tool) Closes #59

Fixes IDOR vulnerability where thread endpoints (get, send message, delete) accepted a :project param but never verified the thread belonged to that project. Now all three endpoints verify thread.projectId === project.id before allowing access. Addresses review comment #1 (Security - CRITICAL)

Wrap JSON.parse(toolCall.function.arguments) in try-catch to prevent crashes when LLMs return malformed JSON. On parse error, persist the error as a tool result and continue the agent loop instead of crashing. Addresses review comment #2 (Bug)

Replace dynamic imports of 'eq' and 'projects' table inside the message handler with static top-level imports to eliminate async overhead on every message. Addresses review comment #3 (Performance)

… layer Create AgentServices class that provides direct DB access for agent tools, eliminating the circular dependency where tools called the server's own HTTP API. Most read-only tools (get_status, get_evidence, get_timeline, list_keywords, list_competitors, get_run_details) now use direct DB calls via AgentServices. Write operations (run_sweep) and external integrations (GSC) still use HTTP for proper job orchestration and auth handling. Benefits: - Eliminates ~1-5ms HTTP localhost roundtrip per tool call - Removes startup timing dependency - Simplifies auth config Addresses review comment #4 (Architecture)

- Remove redundant if/else in llm.ts that set the same Authorization header - Remove unused ToolDefinition interface (actual interface is AgentTool) Addresses review comments #5 and #6 (Style)

- P1: History windowing now returns newest N messages (was oldest N, causing long threads to drop the user's latest prompt) - P1: SSRF validation now blocks localhost, IPv6 loopback/private, and resolves hostnames to verify they don't point to internal IPs - P2: getRun() now requires projectName to prevent cross-project data access via known run IDs - P2: getHistory() now queries snapshots for all returned runs (was only querying the first run ID) - P2: convertToClaudeMessages() now handles malformed JSON in historical tool calls instead of crashing the thread Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix tool-call persistence ordering: persist assistant row before tool.execute() so DB is never left with orphaned tool results - Guard against empty apiKey: return undefined instead of silently constructing a broken handler - Fall back to localhost:{port} when apiUrl is not configured so self-hosted instances can use HTTP-backed agent tools - Explicitly delete agent_messages before thread deletion (don't rely on PRAGMA foreign_keys = ON) - Add maxLength: 8000 on message body schema (Fastify/Ajv enforcement) - Fix N+1 in getTimeline: bulk-fetch all snapshots with inArray - Remove dead claude entry from PROVIDER_ENDPOINTS (uses dedicated path) - Clean up duplicate projects import alias in server.ts - Narrow dns.resolve6 catch to ENODATA/ENOTFOUND only - Add CHECK constraint on agent_messages.role column Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Critical fixes: - Fix history truncation splitting tool-call pairs: trim orphaned tool/assistant messages at the window boundary - Add per-thread concurrency guard (409 Conflict if thread is busy) - Fix get_status returning oldest 3 runs (slice(-3) → slice(0,3)) - Resolve LLM config from registry at call time instead of capturing stale API key at startup - Merge consecutive Claude tool results into single user message to avoid invalid same-role sequences Important fixes: - Add 20KB truncation cap on tool results to prevent blowing up LLM context window - Guard against empty toolCalls array causing silent spin - Add 90s timeout on all LLM fetch calls - Return structured error responses (502 for LLM errors) instead of generic 500s - Fix inconsistent return shape in getHistory (evidence → snapshots) - Add maxLength/enum validation on thread title and channel fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Rename the agent to "Aero" across CLI output and error messages - Add soul.md as the agent's identity/personality definition (checked into repo as the default, loaded from ~/.canonry/soul.md at runtime if the user wants to customize) - Add memory.md as persistent context that Aero accumulates — loaded from ~/.canonry/memory.md at runtime so users can prime the agent with project-specific knowledge - System prompt now composes: soul + project context + tools + memory - Built-in soul is embedded in prompt.ts so it works after tsup bundling - Agent remains fully optional: no background processes, only activates on explicit user request via CLI or API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Users can now choose which LLM provider Aero uses per message: - CLI: canonry agent ask <project> "msg" --provider claude - API: POST /agent/threads/:id/messages { message, provider: "gemini" } The provider field is optional — omitting it uses the default (configured in agent.provider or auto-detected: claude > openai > gemini). If the requested provider isn't configured, returns a clear error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds the /aero route with a full chat interface for interacting with the built-in Aero agent. Includes project selector, provider/model selector, thread management (create/delete), message display with optimistic rendering, and a thinking animation during API calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… background processing Major Aero agent improvements: - Memory: get_memory/save_memory tools with pre-seeded domain knowledge (citation states, provider grounding mechanics, regression detection) - Startup sequence: auto-gathers context on new threads, responds naturally - System tools (opt-in): run_command, read_file, write_file, list_files, http_request — gated behind agent.systemTools config flag - Write tools: add/remove keywords, add/remove competitors, update_project - Background processing: send-message returns 202, UI polls for completion, agent work survives page navigation - Chat UI: markdown rendering, auto-expanding textarea, inline thread rename, relative dates, cleaner thread list, no page scroll - Claude API fix: bidirectional tool_use/tool_result validation prevents orphaned blocks from corrupting conversation history - CLI polling: agent ask now polls thread status instead of blocking - Remove unused footer, PATCH endpoint for thread rename, auto-titling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…quest size logging - Reduce default maxHistoryMessages from 30 to 20 (fewer stale messages) - Compress tool results older than 8 rows to 500 chars to prevent large get_evidence/get_memory results from inflating every subsequent request - Add stderr logging per request: ~N tokens (M chars, K messages) for debugging - Version 1.17.0 → 1.18.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…construction The previous two-pass validation approach had edge cases where the passes interacted in ways that still left orphaned tool_result blocks (causing Claude 400 errors at messages.0.content.0). New approach: state machine that walks the OpenAI-format messages once and only emits a tool call group (assistant+tool_use → user+tool_result) when ALL tool_use blocks have matching tool_result blocks. Incomplete groups from truncated history or server crashes are dropped entirely. Consecutive same-role messages are merged at the end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Users can now pick a specific model (e.g. Sonnet vs Opus) from the chat UI when a provider is selected. This avoids rate limit issues when the provider-level config is set to a model with low rate limits. Model priority: request model > agent config > provider config > default. Also syncs DEFAULT_MODELS in llm.ts with MODEL_REGISTRY from contracts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolve conflicts: keep both indexing API (from main) and agent features (from feat/agent). Version stays at 1.19.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arberx · 2026-03-18T00:00:11Z

Unsure that this is necessary in Canonry at the moment.

The real power of canonry is leveraged through a "openclaw" type agent that has full accessibility to the host operating system with a well constructed memory.

This agent UI feels like any other agent UI. Not a fan so far.

arberx · 2026-03-18T09:11:27Z

packages/canonry/src/commands/agent.ts

+    process.stdout.write('Aero is thinking...')
+  }
+
+  await client.sendAgentMessage(project, threadId, message, opts?.provider)


Missing error handling on sendAgentMessage: If this call throws (e.g. network error, 4xx/5xx), the exception propagates uncaught and the polling loop below never runs — which is fine. However, if the server accepts the message but never transitions out of 'processing', the loop silently times out and returns an empty string (see line 62–80). A tighter pattern would be to await this, and then check the thread status before entering the poll loop.

arberx · 2026-03-18T09:11:28Z

packages/canonry/src/commands/agent.ts

+    if (opts?.format !== 'json') process.stdout.write('.')
+  }
+
+  if (opts?.format !== 'json') console.log('\n')


Silent timeout — no error signal when the agent takes >3 minutes: When i reaches 120 (120 × 1500ms = 3 min), the loop exits without setting process.exitCode or printing an error. The caller receives an empty response string and exit code 0, which looks like success.

// After the loop: if (!response) { console.error('\nTimed out waiting for agent response (3 min).') process.exitCode = 1 return }

arberx · 2026-03-18T09:11:29Z

packages/canonry/src/sitemap-parser.ts

+    await validateSitemapUrl(url)
+  }

  const res = await fetch(url)


DNS rebinding / TOCTOU: The DNS check in validateSitemapUrl (lines 66–88) happens before fetch(url). A malicious DNS server can return a public IP during validation, then switch to a private IP for the actual fetch request — this is a classic DNS rebinding attack.

Full mitigation requires resolving the hostname to an IP, asserting it's public, then connecting to that specific IP directly (e.g. by passing a custom agent to fetch that pins the resolved IP). The current approach significantly raises the bar vs. the old static regex check, but it is not bulletproof. Worth adding a comment documenting this known limitation so it's not mistaken for a complete fix.

arberx · 2026-03-18T09:11:30Z

packages/canonry/src/server.ts

+  const services = new AgentServices(db)
+
+  // ApiClient is only needed for HTTP-backed tools (run_sweep, GSC).
+  // If apiUrl/apiKey aren't set (self-hosted), those tools will gracefully error.


Security: systemTools defaults to false — good. Consider adding a prominent warning when it's enabled.

When agent.systemTools: true, the built-in agent gains shell execution, file I/O, and HTTP request capabilities. Since the agent operates on user-supplied message content, an adversarial input could abuse these tools. At minimum, log a startup warning when systemTools is true so operators notice, and document clearly in the config schema that this is a dangerous option.

arberx

🤖 Automated Review — PR #74 (incremental: new commits since last review)

Summary: Solid agent foundation. The LLM loop, tool system, and CLI commands are well-structured. A few issues flagged inline:

Severity	Finding	File
🟠 Bug	Silent timeout — loop exits with `exitCode=0` + empty response after 3 min	`commands/agent.ts:83`
🟠 Bug	`--wait` polls `indexingState` (wrong field — means allowed, not indexed)	`commands/google.ts` (already in main)
🔴 Security	DNS rebinding TOCTOU in `validateSitemapUrl`	`sitemap-parser.ts:110`
🟡 Robustness	`sendAgentMessage` failure not linked to poll-loop lifecycle	`commands/agent.ts:58`
🟡 Security	`systemTools: true` silently grants shell/file/network to agent	`server.ts:509`

The --wait / INDEXING_ALLOWED bug is already on main — worth a hotfix or follow-up issue independent of this PR.

arberx force-pushed the feat/agent branch from fe5ef2f to e9d0819 Compare March 17, 2026 02:04