feat: Built-in agent — LLM-powered AEO analyst with chat API#74
feat: Built-in agent — LLM-powered AEO analyst with chat API#74
Conversation
arberx
left a comment
There was a problem hiding this comment.
🤖 Automated Review Summary
Files reviewed: 21
Comments left: 13
Issues found:
- 🔴 Bug: 5
- 🟡 Security: 2
- 🟠 Performance: 1
- 🔵 Type Safety: 2
- 🟣 Testing: 1
- ⚪ Style: 1
- ⚪ Dead code: 1
Key findings
Bugs (fix before merge):
- Tool-call persistence ordering (
loop.ts~line 188) — assistant tool-call rows are stored aftertool.execute()runs. If execution throws, the DB ends up with atoolresult row but no matchingassistantrow, corrupting thread replay for Claude. - Empty
apiKeyfallback (server.ts~line 499) —apiKey ?? ''means a misconfigured provider silently constructs a working handler that returns 401 on every LLM call instead of returningundefinedto disable the agent. ApiClientwith undefinedapiUrl/apiKey(server.ts~line 503) — self-hosted instances withoutapiUrlset will get silent failures onrun_sweepand all GSC tools.- Orphaned messages on thread delete (
agent.ts~line 220) —ON DELETE CASCADEonly fires ifPRAGMA foreign_keys = ON, which is not guaranteed. - Unbounded
messagefield (agent.ts~line 158) — nomaxLengthon the message body; a single large payload can rack up LLM token costs.
Security:
dns.resolve6catch-all silently swallows errors (sitemap-parser.ts) — not exploitable but could incorrectly block IPv6-only hosts.- Missing message length limit (covered above under Bugs).
Testing gap:
- ~830 lines of new agent code ship with zero unit tests. The loop's history-windowing, maxSteps fallback, and JSON recovery paths are all critical and unverified.
Performance:
getTimelinehas an N+1 query pattern;getHistoryalready demonstrates the correct bulk-fetch approach.
What's done well ✅
- The SSRF hardening in
sitemap-parser.tsis thorough: DNS resolution, IPv4/IPv6 loopback, link-local, ULA, and IPv4-mapped IPv6 addresses are all covered, with tests. - History windowing (newest-N-ascending subquery) is the right approach and well-commented.
- Provider abstraction is clean — adding a new LLM is a one-function addition in
llm.ts. - Malformed JSON recovery in
convertToClaudeMessagesis a good defensive touch. - DB schema with
ON DELETE CASCADE+ composite index on(thread_id, created_at)is solid.
Overall assessment: NEEDS_WORK on the tool-call persistence bug and the ApiClient/apiKey issues before this is safe to merge.
This review was generated by an AI agent. Please verify all suggestions.
- Fix tool-call persistence ordering: persist assistant row before
tool.execute() so DB is never left with orphaned tool results
- Guard against empty apiKey: return undefined instead of silently
constructing a broken handler
- Fall back to localhost:{port} when apiUrl is not configured so
self-hosted instances can use HTTP-backed agent tools
- Explicitly delete agent_messages before thread deletion (don't
rely on PRAGMA foreign_keys = ON)
- Add maxLength: 8000 on message body schema (Fastify/Ajv enforcement)
- Fix N+1 in getTimeline: bulk-fetch all snapshots with inArray
- Remove dead claude entry from PROVIDER_ENDPOINTS (uses dedicated path)
- Clean up duplicate projects import alias in server.ts
- Narrow dns.resolve6 catch to ENODATA/ENOTFOUND only
- Add CHECK constraint on agent_messages.role column
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a built-in AI agent that uses canonry's own tools to answer
AEO questions, run sweeps, and explain citation changes. No external
agent framework required — just the LLM provider already configured.
Architecture:
- Agent loop modeled after OpenClaw's pattern (LLM ↔ tool ↔ repeat)
- Uses existing provider API keys from canonry config
- Persistence in SQLite (same database, new tables)
- Provider priority: Claude > OpenAI > Gemini (configurable)
New files:
- packages/canonry/src/agent/ — core agent module
- loop.ts: LLM ↔ tool execution cycle
- llm.ts: provider-agnostic LLM layer (OpenAI, Claude, Gemini)
- tools.ts: canonry operations as LLM-callable functions
- store.ts: thread/message persistence (SQLite)
- prompt.ts: AEO analyst system prompt
- types.ts: shared type definitions
- packages/api-routes/src/agent.ts — REST API for chat
- packages/canonry/src/commands/agent.ts — CLI commands
CLI:
canonry agent ask <project> "message" — chat with the agent
canonry agent threads <project> — list threads
canonry agent thread <project> <id> — show thread history
API:
POST /api/v1/projects/:project/agent/threads — create thread
GET /api/v1/projects/:project/agent/threads — list threads
GET /api/v1/projects/:project/agent/threads/:id — get thread + messages
POST /api/v1/projects/:project/agent/threads/:id/messages — send message
DELETE /api/v1/projects/:project/agent/threads/:id — delete thread
Config:
agent:
provider: claude|openai|gemini (optional, auto-detects)
model: string (optional, uses provider default)
maxSteps: number (default: 10)
maxHistory: number (default: 30)
enabled: boolean (default: true if provider available)
Tools exposed to agent:
- get_status, run_sweep, get_evidence, get_timeline
- list_keywords, list_competitors, get_run_details
- get_gsc_performance, get_gsc_coverage, inspect_url
DB migration:
- agent_threads: conversation threads per project
- agent_messages: messages within threads (user/assistant/tool)
Closes #59
Fixes IDOR vulnerability where thread endpoints (get, send message, delete) accepted a :project param but never verified the thread belonged to that project. Now all three endpoints verify thread.projectId === project.id before allowing access. Addresses review comment #1 (Security - CRITICAL)
Wrap JSON.parse(toolCall.function.arguments) in try-catch to prevent crashes when LLMs return malformed JSON. On parse error, persist the error as a tool result and continue the agent loop instead of crashing. Addresses review comment #2 (Bug)
Replace dynamic imports of 'eq' and 'projects' table inside the message handler with static top-level imports to eliminate async overhead on every message. Addresses review comment #3 (Performance)
… layer Create AgentServices class that provides direct DB access for agent tools, eliminating the circular dependency where tools called the server's own HTTP API. Most read-only tools (get_status, get_evidence, get_timeline, list_keywords, list_competitors, get_run_details) now use direct DB calls via AgentServices. Write operations (run_sweep) and external integrations (GSC) still use HTTP for proper job orchestration and auth handling. Benefits: - Eliminates ~1-5ms HTTP localhost roundtrip per tool call - Removes startup timing dependency - Simplifies auth config Addresses review comment #4 (Architecture)
- P1: History windowing now returns newest N messages (was oldest N, causing long threads to drop the user's latest prompt) - P1: SSRF validation now blocks localhost, IPv6 loopback/private, and resolves hostnames to verify they don't point to internal IPs - P2: getRun() now requires projectName to prevent cross-project data access via known run IDs - P2: getHistory() now queries snapshots for all returned runs (was only querying the first run ID) - P2: convertToClaudeMessages() now handles malformed JSON in historical tool calls instead of crashing the thread Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix tool-call persistence ordering: persist assistant row before
tool.execute() so DB is never left with orphaned tool results
- Guard against empty apiKey: return undefined instead of silently
constructing a broken handler
- Fall back to localhost:{port} when apiUrl is not configured so
self-hosted instances can use HTTP-backed agent tools
- Explicitly delete agent_messages before thread deletion (don't
rely on PRAGMA foreign_keys = ON)
- Add maxLength: 8000 on message body schema (Fastify/Ajv enforcement)
- Fix N+1 in getTimeline: bulk-fetch all snapshots with inArray
- Remove dead claude entry from PROVIDER_ENDPOINTS (uses dedicated path)
- Clean up duplicate projects import alias in server.ts
- Narrow dns.resolve6 catch to ENODATA/ENOTFOUND only
- Add CHECK constraint on agent_messages.role column
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Critical fixes: - Fix history truncation splitting tool-call pairs: trim orphaned tool/assistant messages at the window boundary - Add per-thread concurrency guard (409 Conflict if thread is busy) - Fix get_status returning oldest 3 runs (slice(-3) → slice(0,3)) - Resolve LLM config from registry at call time instead of capturing stale API key at startup - Merge consecutive Claude tool results into single user message to avoid invalid same-role sequences Important fixes: - Add 20KB truncation cap on tool results to prevent blowing up LLM context window - Guard against empty toolCalls array causing silent spin - Add 90s timeout on all LLM fetch calls - Return structured error responses (502 for LLM errors) instead of generic 500s - Fix inconsistent return shape in getHistory (evidence → snapshots) - Add maxLength/enum validation on thread title and channel fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename the agent to "Aero" across CLI output and error messages - Add soul.md as the agent's identity/personality definition (checked into repo as the default, loaded from ~/.canonry/soul.md at runtime if the user wants to customize) - Add memory.md as persistent context that Aero accumulates — loaded from ~/.canonry/memory.md at runtime so users can prime the agent with project-specific knowledge - System prompt now composes: soul + project context + tools + memory - Built-in soul is embedded in prompt.ts so it works after tsup bundling - Agent remains fully optional: no background processes, only activates on explicit user request via CLI or API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users can now choose which LLM provider Aero uses per message:
- CLI: canonry agent ask <project> "msg" --provider claude
- API: POST /agent/threads/:id/messages { message, provider: "gemini" }
The provider field is optional — omitting it uses the default
(configured in agent.provider or auto-detected: claude > openai > gemini).
If the requested provider isn't configured, returns a clear error.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds the /aero route with a full chat interface for interacting with the built-in Aero agent. Includes project selector, provider/model selector, thread management (create/delete), message display with optimistic rendering, and a thinking animation during API calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… background processing Major Aero agent improvements: - Memory: get_memory/save_memory tools with pre-seeded domain knowledge (citation states, provider grounding mechanics, regression detection) - Startup sequence: auto-gathers context on new threads, responds naturally - System tools (opt-in): run_command, read_file, write_file, list_files, http_request — gated behind agent.systemTools config flag - Write tools: add/remove keywords, add/remove competitors, update_project - Background processing: send-message returns 202, UI polls for completion, agent work survives page navigation - Chat UI: markdown rendering, auto-expanding textarea, inline thread rename, relative dates, cleaner thread list, no page scroll - Claude API fix: bidirectional tool_use/tool_result validation prevents orphaned blocks from corrupting conversation history - CLI polling: agent ask now polls thread status instead of blocking - Remove unused footer, PATCH endpoint for thread rename, auto-titling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…quest size logging - Reduce default maxHistoryMessages from 30 to 20 (fewer stale messages) - Compress tool results older than 8 rows to 500 chars to prevent large get_evidence/get_memory results from inflating every subsequent request - Add stderr logging per request: ~N tokens (M chars, K messages) for debugging - Version 1.17.0 → 1.18.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…construction The previous two-pass validation approach had edge cases where the passes interacted in ways that still left orphaned tool_result blocks (causing Claude 400 errors at messages.0.content.0). New approach: state machine that walks the OpenAI-format messages once and only emits a tool call group (assistant+tool_use → user+tool_result) when ALL tool_use blocks have matching tool_result blocks. Incomplete groups from truncated history or server crashes are dropped entirely. Consecutive same-role messages are merged at the end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users can now pick a specific model (e.g. Sonnet vs Opus) from the chat UI when a provider is selected. This avoids rate limit issues when the provider-level config is set to a model with low rate limits. Model priority: request model > agent config > provider config > default. Also syncs DEFAULT_MODELS in llm.ts with MODEL_REGISTRY from contracts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflicts: keep both indexing API (from main) and agent features (from feat/agent). Version stays at 1.19.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Unsure that this is necessary in Canonry at the moment. The real power of canonry is leveraged through a "openclaw" type agent that has full accessibility to the host operating system with a well constructed memory. This agent UI feels like any other agent UI. Not a fan so far. |
| process.stdout.write('Aero is thinking...') | ||
| } | ||
|
|
||
| await client.sendAgentMessage(project, threadId, message, opts?.provider) |
There was a problem hiding this comment.
Missing error handling on sendAgentMessage: If this call throws (e.g. network error, 4xx/5xx), the exception propagates uncaught and the polling loop below never runs — which is fine. However, if the server accepts the message but never transitions out of 'processing', the loop silently times out and returns an empty string (see line 62–80). A tighter pattern would be to await this, and then check the thread status before entering the poll loop.
| if (opts?.format !== 'json') process.stdout.write('.') | ||
| } | ||
|
|
||
| if (opts?.format !== 'json') console.log('\n') |
There was a problem hiding this comment.
Silent timeout — no error signal when the agent takes >3 minutes: When i reaches 120 (120 × 1500ms = 3 min), the loop exits without setting process.exitCode or printing an error. The caller receives an empty response string and exit code 0, which looks like success.
// After the loop:
if (!response) {
console.error('\nTimed out waiting for agent response (3 min).')
process.exitCode = 1
return
}| await validateSitemapUrl(url) | ||
| } | ||
|
|
||
| const res = await fetch(url) |
There was a problem hiding this comment.
DNS rebinding / TOCTOU: The DNS check in validateSitemapUrl (lines 66–88) happens before fetch(url). A malicious DNS server can return a public IP during validation, then switch to a private IP for the actual fetch request — this is a classic DNS rebinding attack.
Full mitigation requires resolving the hostname to an IP, asserting it's public, then connecting to that specific IP directly (e.g. by passing a custom agent to fetch that pins the resolved IP). The current approach significantly raises the bar vs. the old static regex check, but it is not bulletproof. Worth adding a comment documenting this known limitation so it's not mistaken for a complete fix.
| const services = new AgentServices(db) | ||
|
|
||
| // ApiClient is only needed for HTTP-backed tools (run_sweep, GSC). | ||
| // If apiUrl/apiKey aren't set (self-hosted), those tools will gracefully error. |
There was a problem hiding this comment.
Security: systemTools defaults to false — good. Consider adding a prominent warning when it's enabled.
When agent.systemTools: true, the built-in agent gains shell execution, file I/O, and HTTP request capabilities. Since the agent operates on user-supplied message content, an adversarial input could abuse these tools. At minimum, log a startup warning when systemTools is true so operators notice, and document clearly in the config schema that this is a dangerous option.
arberx
left a comment
There was a problem hiding this comment.
🤖 Automated Review — PR #74 (incremental: new commits since last review)
Summary: Solid agent foundation. The LLM loop, tool system, and CLI commands are well-structured. A few issues flagged inline:
| Severity | Finding | File |
|---|---|---|
| 🟠 Bug | Silent timeout — loop exits with exitCode=0 + empty response after 3 min |
commands/agent.ts:83 |
| 🟠 Bug | --wait polls indexingState (wrong field — means allowed, not indexed) |
commands/google.ts (already in main) |
| 🔴 Security | DNS rebinding TOCTOU in validateSitemapUrl |
sitemap-parser.ts:110 |
| 🟡 Robustness | sendAgentMessage failure not linked to poll-loop lifecycle |
commands/agent.ts:58 |
| 🟡 Security | systemTools: true silently grants shell/file/network to agent |
server.ts:509 |
The --wait / INDEXING_ALLOWED bug is already on main — worth a hotfix or follow-up issue independent of this PR.
Summary
canonry agent) for interactive terminal chat sessionsCommits
feat: built-in agent — LLM-powered AEO analyst with chat API— core agent loop, tools, store, routes, CLI command, DB migrationsfix(security): Add project ownership verification to thread endpointsfix(agent): Add error handling for malformed JSON in tool call argumentsperf(agent): Move dynamic imports to top-levelrefactor(agent): Replace circular HTTP self-calls with direct service layerstyle(agent): Remove dead code and unused typesfix(agent): fix 5 bugs in agent loop, SSRF validation, and services— history windowing, SSRF validation, project-scoped getRun, complete getHistory, Claude replay fixTest plan
canonry agentinteractive session against a project with runs🤖 Generated with Claude Code