Skip to content

feat: Built-in agent — LLM-powered AEO analyst with chat API#74

Draft
arberx wants to merge 17 commits intomainfrom
feat/agent
Draft

feat: Built-in agent — LLM-powered AEO analyst with chat API#74
arberx wants to merge 17 commits intomainfrom
feat/agent

Conversation

@arberx
Copy link
Member

@arberx arberx commented Mar 16, 2026

Summary

  • Built-in LLM-powered agent with chat API, thread persistence in SQLite, and 9 domain-specific AEO tools (visibility runs, timeline analysis, evidence lookup, competitor analysis, keyword management, sitemap inspection, etc.)
  • CLI command (canonry agent) for interactive terminal chat sessions
  • Multi-provider support — agent threads can use OpenAI, Claude, or Gemini as the backing LLM
  • Security hardening — project ownership verification on thread endpoints, SSRF protection with DNS resolution on sitemap inspection, project-scoped run access
  • Bug fixes — history windowing (newest messages kept, not oldest), complete evidence across all runs, malformed JSON recovery for Claude message replay

Commits

  1. feat: built-in agent — LLM-powered AEO analyst with chat API — core agent loop, tools, store, routes, CLI command, DB migrations
  2. fix(security): Add project ownership verification to thread endpoints
  3. fix(agent): Add error handling for malformed JSON in tool call arguments
  4. perf(agent): Move dynamic imports to top-level
  5. refactor(agent): Replace circular HTTP self-calls with direct service layer
  6. style(agent): Remove dead code and unused types
  7. fix(agent): fix 5 bugs in agent loop, SSRF validation, and services — history windowing, SSRF validation, project-scoped getRun, complete getHistory, Claude replay fix

Test plan

  • CI passes (typecheck, lint, test)
  • SSRF validation tests cover IPv6 loopback, link-local, ULA, localhost, DNS resolution
  • Manual: canonry agent interactive session against a project with runs
  • Manual: verify thread persistence across server restarts

🤖 Generated with Claude Code

Copy link
Member Author

@arberx arberx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Automated Review Summary

Files reviewed: 21
Comments left: 13

Issues found:

  • 🔴 Bug: 5
  • 🟡 Security: 2
  • 🟠 Performance: 1
  • 🔵 Type Safety: 2
  • 🟣 Testing: 1
  • ⚪ Style: 1
  • ⚪ Dead code: 1

Key findings

Bugs (fix before merge):

  1. Tool-call persistence ordering (loop.ts ~line 188) — assistant tool-call rows are stored after tool.execute() runs. If execution throws, the DB ends up with a tool result row but no matching assistant row, corrupting thread replay for Claude.
  2. Empty apiKey fallback (server.ts ~line 499) — apiKey ?? '' means a misconfigured provider silently constructs a working handler that returns 401 on every LLM call instead of returning undefined to disable the agent.
  3. ApiClient with undefined apiUrl/apiKey (server.ts ~line 503) — self-hosted instances without apiUrl set will get silent failures on run_sweep and all GSC tools.
  4. Orphaned messages on thread delete (agent.ts ~line 220) — ON DELETE CASCADE only fires if PRAGMA foreign_keys = ON, which is not guaranteed.
  5. Unbounded message field (agent.ts ~line 158) — no maxLength on the message body; a single large payload can rack up LLM token costs.

Security:

  • dns.resolve6 catch-all silently swallows errors (sitemap-parser.ts) — not exploitable but could incorrectly block IPv6-only hosts.
  • Missing message length limit (covered above under Bugs).

Testing gap:

  • ~830 lines of new agent code ship with zero unit tests. The loop's history-windowing, maxSteps fallback, and JSON recovery paths are all critical and unverified.

Performance:

  • getTimeline has an N+1 query pattern; getHistory already demonstrates the correct bulk-fetch approach.

What's done well ✅

  • The SSRF hardening in sitemap-parser.ts is thorough: DNS resolution, IPv4/IPv6 loopback, link-local, ULA, and IPv4-mapped IPv6 addresses are all covered, with tests.
  • History windowing (newest-N-ascending subquery) is the right approach and well-commented.
  • Provider abstraction is clean — adding a new LLM is a one-function addition in llm.ts.
  • Malformed JSON recovery in convertToClaudeMessages is a good defensive touch.
  • DB schema with ON DELETE CASCADE + composite index on (thread_id, created_at) is solid.

Overall assessment: NEEDS_WORK on the tool-call persistence bug and the ApiClient/apiKey issues before this is safe to merge.

This review was generated by an AI agent. Please verify all suggestions.

arberx added a commit that referenced this pull request Mar 17, 2026
- Fix tool-call persistence ordering: persist assistant row before
  tool.execute() so DB is never left with orphaned tool results
- Guard against empty apiKey: return undefined instead of silently
  constructing a broken handler
- Fall back to localhost:{port} when apiUrl is not configured so
  self-hosted instances can use HTTP-backed agent tools
- Explicitly delete agent_messages before thread deletion (don't
  rely on PRAGMA foreign_keys = ON)
- Add maxLength: 8000 on message body schema (Fastify/Ajv enforcement)
- Fix N+1 in getTimeline: bulk-fetch all snapshots with inArray
- Remove dead claude entry from PROVIDER_ENDPOINTS (uses dedicated path)
- Clean up duplicate projects import alias in server.ts
- Narrow dns.resolve6 catch to ENODATA/ENOTFOUND only
- Add CHECK constraint on agent_messages.role column

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
arberx and others added 12 commits March 17, 2026 14:02
Add a built-in AI agent that uses canonry's own tools to answer
AEO questions, run sweeps, and explain citation changes. No external
agent framework required — just the LLM provider already configured.

Architecture:
- Agent loop modeled after OpenClaw's pattern (LLM ↔ tool ↔ repeat)
- Uses existing provider API keys from canonry config
- Persistence in SQLite (same database, new tables)
- Provider priority: Claude > OpenAI > Gemini (configurable)

New files:
- packages/canonry/src/agent/ — core agent module
  - loop.ts: LLM ↔ tool execution cycle
  - llm.ts: provider-agnostic LLM layer (OpenAI, Claude, Gemini)
  - tools.ts: canonry operations as LLM-callable functions
  - store.ts: thread/message persistence (SQLite)
  - prompt.ts: AEO analyst system prompt
  - types.ts: shared type definitions
- packages/api-routes/src/agent.ts — REST API for chat
- packages/canonry/src/commands/agent.ts — CLI commands

CLI:
  canonry agent ask <project> "message"    — chat with the agent
  canonry agent threads <project>           — list threads
  canonry agent thread <project> <id>       — show thread history

API:
  POST   /api/v1/projects/:project/agent/threads              — create thread
  GET    /api/v1/projects/:project/agent/threads              — list threads
  GET    /api/v1/projects/:project/agent/threads/:id          — get thread + messages
  POST   /api/v1/projects/:project/agent/threads/:id/messages — send message
  DELETE /api/v1/projects/:project/agent/threads/:id          — delete thread

Config:
  agent:
    provider: claude|openai|gemini  (optional, auto-detects)
    model: string                   (optional, uses provider default)
    maxSteps: number                (default: 10)
    maxHistory: number              (default: 30)
    enabled: boolean                (default: true if provider available)

Tools exposed to agent:
  - get_status, run_sweep, get_evidence, get_timeline
  - list_keywords, list_competitors, get_run_details
  - get_gsc_performance, get_gsc_coverage, inspect_url

DB migration:
  - agent_threads: conversation threads per project
  - agent_messages: messages within threads (user/assistant/tool)

Closes #59
Fixes IDOR vulnerability where thread endpoints (get, send message, delete)
accepted a :project param but never verified the thread belonged to that project.

Now all three endpoints verify thread.projectId === project.id before allowing access.

Addresses review comment #1 (Security - CRITICAL)
Wrap JSON.parse(toolCall.function.arguments) in try-catch to prevent crashes
when LLMs return malformed JSON. On parse error, persist the error as a tool
result and continue the agent loop instead of crashing.

Addresses review comment #2 (Bug)
Replace dynamic imports of 'eq' and 'projects' table inside the message handler
with static top-level imports to eliminate async overhead on every message.

Addresses review comment #3 (Performance)
… layer

Create AgentServices class that provides direct DB access for agent tools,
eliminating the circular dependency where tools called the server's own HTTP API.

Most read-only tools (get_status, get_evidence, get_timeline, list_keywords,
list_competitors, get_run_details) now use direct DB calls via AgentServices.

Write operations (run_sweep) and external integrations (GSC) still use HTTP
for proper job orchestration and auth handling.

Benefits:
- Eliminates ~1-5ms HTTP localhost roundtrip per tool call
- Removes startup timing dependency
- Simplifies auth config

Addresses review comment #4 (Architecture)
- Remove redundant if/else in llm.ts that set the same Authorization header
- Remove unused ToolDefinition interface (actual interface is AgentTool)

Addresses review comments #5 and #6 (Style)
- P1: History windowing now returns newest N messages (was oldest N,
  causing long threads to drop the user's latest prompt)
- P1: SSRF validation now blocks localhost, IPv6 loopback/private,
  and resolves hostnames to verify they don't point to internal IPs
- P2: getRun() now requires projectName to prevent cross-project
  data access via known run IDs
- P2: getHistory() now queries snapshots for all returned runs
  (was only querying the first run ID)
- P2: convertToClaudeMessages() now handles malformed JSON in
  historical tool calls instead of crashing the thread

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix tool-call persistence ordering: persist assistant row before
  tool.execute() so DB is never left with orphaned tool results
- Guard against empty apiKey: return undefined instead of silently
  constructing a broken handler
- Fall back to localhost:{port} when apiUrl is not configured so
  self-hosted instances can use HTTP-backed agent tools
- Explicitly delete agent_messages before thread deletion (don't
  rely on PRAGMA foreign_keys = ON)
- Add maxLength: 8000 on message body schema (Fastify/Ajv enforcement)
- Fix N+1 in getTimeline: bulk-fetch all snapshots with inArray
- Remove dead claude entry from PROVIDER_ENDPOINTS (uses dedicated path)
- Clean up duplicate projects import alias in server.ts
- Narrow dns.resolve6 catch to ENODATA/ENOTFOUND only
- Add CHECK constraint on agent_messages.role column

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Critical fixes:
- Fix history truncation splitting tool-call pairs: trim orphaned
  tool/assistant messages at the window boundary
- Add per-thread concurrency guard (409 Conflict if thread is busy)
- Fix get_status returning oldest 3 runs (slice(-3) → slice(0,3))
- Resolve LLM config from registry at call time instead of capturing
  stale API key at startup
- Merge consecutive Claude tool results into single user message to
  avoid invalid same-role sequences

Important fixes:
- Add 20KB truncation cap on tool results to prevent blowing up
  LLM context window
- Guard against empty toolCalls array causing silent spin
- Add 90s timeout on all LLM fetch calls
- Return structured error responses (502 for LLM errors) instead of
  generic 500s
- Fix inconsistent return shape in getHistory (evidence → snapshots)
- Add maxLength/enum validation on thread title and channel fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename the agent to "Aero" across CLI output and error messages
- Add soul.md as the agent's identity/personality definition (checked
  into repo as the default, loaded from ~/.canonry/soul.md at runtime
  if the user wants to customize)
- Add memory.md as persistent context that Aero accumulates — loaded
  from ~/.canonry/memory.md at runtime so users can prime the agent
  with project-specific knowledge
- System prompt now composes: soul + project context + tools + memory
- Built-in soul is embedded in prompt.ts so it works after tsup bundling
- Agent remains fully optional: no background processes, only activates
  on explicit user request via CLI or API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users can now choose which LLM provider Aero uses per message:

- CLI: canonry agent ask <project> "msg" --provider claude
- API: POST /agent/threads/:id/messages { message, provider: "gemini" }

The provider field is optional — omitting it uses the default
(configured in agent.provider or auto-detected: claude > openai > gemini).
If the requested provider isn't configured, returns a clear error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds the /aero route with a full chat interface for interacting with the
built-in Aero agent. Includes project selector, provider/model selector,
thread management (create/delete), message display with optimistic
rendering, and a thinking animation during API calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
arberx and others added 5 commits March 17, 2026 15:12
… background processing

Major Aero agent improvements:

- Memory: get_memory/save_memory tools with pre-seeded domain knowledge
  (citation states, provider grounding mechanics, regression detection)
- Startup sequence: auto-gathers context on new threads, responds naturally
- System tools (opt-in): run_command, read_file, write_file, list_files,
  http_request — gated behind agent.systemTools config flag
- Write tools: add/remove keywords, add/remove competitors, update_project
- Background processing: send-message returns 202, UI polls for completion,
  agent work survives page navigation
- Chat UI: markdown rendering, auto-expanding textarea, inline thread rename,
  relative dates, cleaner thread list, no page scroll
- Claude API fix: bidirectional tool_use/tool_result validation prevents
  orphaned blocks from corrupting conversation history
- CLI polling: agent ask now polls thread status instead of blocking
- Remove unused footer, PATCH endpoint for thread rename, auto-titling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…quest size logging

- Reduce default maxHistoryMessages from 30 to 20 (fewer stale messages)
- Compress tool results older than 8 rows to 500 chars to prevent large
  get_evidence/get_memory results from inflating every subsequent request
- Add stderr logging per request: ~N tokens (M chars, K messages) for debugging
- Version 1.17.0 → 1.18.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…construction

The previous two-pass validation approach had edge cases where the passes
interacted in ways that still left orphaned tool_result blocks (causing
Claude 400 errors at messages.0.content.0).

New approach: state machine that walks the OpenAI-format messages once and
only emits a tool call group (assistant+tool_use → user+tool_result) when
ALL tool_use blocks have matching tool_result blocks. Incomplete groups from
truncated history or server crashes are dropped entirely. Consecutive
same-role messages are merged at the end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users can now pick a specific model (e.g. Sonnet vs Opus) from the chat
UI when a provider is selected. This avoids rate limit issues when the
provider-level config is set to a model with low rate limits.

Model priority: request model > agent config > provider config > default.
Also syncs DEFAULT_MODELS in llm.ts with MODEL_REGISTRY from contracts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflicts: keep both indexing API (from main) and agent
features (from feat/agent). Version stays at 1.19.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arberx
Copy link
Member Author

arberx commented Mar 18, 2026

Unsure that this is necessary in Canonry at the moment.

The real power of canonry is leveraged through a "openclaw" type agent that has full accessibility to the host operating system with a well constructed memory.

This agent UI feels like any other agent UI. Not a fan so far.

@arberx arberx marked this pull request as draft March 18, 2026 00:00
process.stdout.write('Aero is thinking...')
}

await client.sendAgentMessage(project, threadId, message, opts?.provider)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling on sendAgentMessage: If this call throws (e.g. network error, 4xx/5xx), the exception propagates uncaught and the polling loop below never runs — which is fine. However, if the server accepts the message but never transitions out of 'processing', the loop silently times out and returns an empty string (see line 62–80). A tighter pattern would be to await this, and then check the thread status before entering the poll loop.

if (opts?.format !== 'json') process.stdout.write('.')
}

if (opts?.format !== 'json') console.log('\n')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent timeout — no error signal when the agent takes >3 minutes: When i reaches 120 (120 × 1500ms = 3 min), the loop exits without setting process.exitCode or printing an error. The caller receives an empty response string and exit code 0, which looks like success.

// After the loop:
if (!response) {
  console.error('\nTimed out waiting for agent response (3 min).')
  process.exitCode = 1
  return
}

await validateSitemapUrl(url)
}

const res = await fetch(url)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNS rebinding / TOCTOU: The DNS check in validateSitemapUrl (lines 66–88) happens before fetch(url). A malicious DNS server can return a public IP during validation, then switch to a private IP for the actual fetch request — this is a classic DNS rebinding attack.

Full mitigation requires resolving the hostname to an IP, asserting it's public, then connecting to that specific IP directly (e.g. by passing a custom agent to fetch that pins the resolved IP). The current approach significantly raises the bar vs. the old static regex check, but it is not bulletproof. Worth adding a comment documenting this known limitation so it's not mistaken for a complete fix.

const services = new AgentServices(db)

// ApiClient is only needed for HTTP-backed tools (run_sweep, GSC).
// If apiUrl/apiKey aren't set (self-hosted), those tools will gracefully error.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: systemTools defaults to false — good. Consider adding a prominent warning when it's enabled.

When agent.systemTools: true, the built-in agent gains shell execution, file I/O, and HTTP request capabilities. Since the agent operates on user-supplied message content, an adversarial input could abuse these tools. At minimum, log a startup warning when systemTools is true so operators notice, and document clearly in the config schema that this is a dangerous option.

Copy link
Member Author

@arberx arberx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Automated Review — PR #74 (incremental: new commits since last review)

Summary: Solid agent foundation. The LLM loop, tool system, and CLI commands are well-structured. A few issues flagged inline:

Severity Finding File
🟠 Bug Silent timeout — loop exits with exitCode=0 + empty response after 3 min commands/agent.ts:83
🟠 Bug --wait polls indexingState (wrong field — means allowed, not indexed) commands/google.ts (already in main)
🔴 Security DNS rebinding TOCTOU in validateSitemapUrl sitemap-parser.ts:110
🟡 Robustness sendAgentMessage failure not linked to poll-loop lifecycle commands/agent.ts:58
🟡 Security systemTools: true silently grants shell/file/network to agent server.ts:509

The --wait / INDEXING_ALLOWED bug is already on main — worth a hotfix or follow-up issue independent of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant