diff --git a/docs/TODO.md b/docs/TODO.md index a70e0bfbd..097620200 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -1,25 +1,140 @@ # Documentation TODO -## Installation -- [x] Add Homebrew install instructions for Mac/Linux -- [x] Mention that cagent comes pre-installed with Docker Desktop -- [x] Explain why `task build-local` is recommended on Windows (Docker Buildx cross-compilation) +## New Pages -## Quick Start -- [x] Clarify that Step 1 (install binary) and Step 2 (build from source) are two alternative options, not sequential steps -- [x] Add a path to get started from a built-in example (`cagent run examples/...`) or the default agent -- [x] Show running an agent from the registry (`cagent run agentcatalog/...`) +- [x] **Go SDK** — Not documented anywhere. The `examples/golibrary/` directory shows how to use cagent as a Go library. Needs a dedicated page. *(Completed: pages/guides/go-sdk.html)* +- [x] **Hooks** — `hooks` agent config (`pre_tool_use`, `post_tool_use`, `session_start`, `session_end`) is a significant feature with no documentation page. Covers running shell commands at various agent lifecycle points. *(Completed: pages/configuration/hooks.html)* +- [x] **Permissions** — Top-level `permissions` config with `allow`/`deny` glob patterns for tool call approval. Mentioned briefly in TUI page but has no dedicated reference. *(Completed: pages/configuration/permissions.html)* +- [x] **Sandbox Mode** — Shell tool `sandbox` config runs commands in Docker containers. Includes `image` and `paths` (bind mounts with `:ro` support). Not documented. *(Completed: pages/configuration/sandbox.html)* +- [x] **Structured Output** — Agent-level `structured_output` config (name, description, schema, strict). Forces model responses into a JSON schema. Not documented. *(Completed: pages/configuration/structured-output.html)* +- [x] **Model Routing** — Model-level `routing` config with rules that route requests to different models based on example phrases (rule-based router). *(Completed: pages/configuration/routing.html)* +- [x] **Custom Providers (top-level `providers` section)** — The `providers` top-level config key for defining reusable provider definitions with `api_type`, `base_url`, `token_key`. *(Completed: Added to pages/configuration/overview.html)* +- [x] **LSP Tool** — `type: lsp` toolset that provides Language Server Protocol integration (diagnostics, code actions, references, rename, etc.). Not documented anywhere. *(Completed: pages/tools/lsp.html)* +- [x] **User Prompt Tool** — `type: user_prompt` toolset that allows agents to ask the user for input mid-conversation. Not documented. *(Completed: pages/tools/user-prompt.html)* +- [x] **API Tool** — `api_config` on toolsets for defining HTTP API tools with endpoint, method, headers, args, and output_schema. Not documented. *(Completed: pages/tools/api.html)* +- [ ] **Branching Sessions** — TUI feature (v1.20.6) allowing editing previous messages to create branches. Mentioned in TUI page but could use more detail. -## Tools -- [x] Add missing built-in tools documentation (think, todo, memory, fetch, script, etc.) -- [x] Expand Tool Config page with more detailed content and examples for each tool type +## Missing Details in Existing Pages -## Features -- [x] Fix broken TUI demo GIF link on the home page +### Configuration > Agents (`agents.html`) -## Core Concepts -- [x] Move Agent Distribution from Features to Core Concepts (packaging & sharing is a fundamental concept) +- [x] `welcome_message` — Agent property not listed in schema or properties table *(Added)* +- [x] `handoffs` — Agent property for listing agents that can be handed off to (different from `sub_agents`). Not documented. *(Added)* +- [x] `add_prompt_files` — Agent property for including additional prompt files. *(Added)* +- [x] `add_description_parameter` — Agent property. *(Added)* +- [x] `code_mode_tools` — Agent property. *(Added)* +- [x] `hooks` — Agent property. Not shown in schema or properties table. *(Added with link to hooks page)* +- [x] `structured_output` — Agent property. Not shown in schema or properties table. *(Added with link to structured-output page)* +- [x] `defer` — Tool deferral configuration. *(Added with examples)* +- [x] `permissions` — Permission configuration. *(Added with link to permissions page)* +- [x] `sandbox` — Sandbox mode configuration. *(Added with link to sandbox page)* -## New Pages -- [ ] Remote runtime — remote runtime/server mode is not documented anywhere -- [ ] Changelog / What's New — could add a page or section sourced from `CHANGELOG.md` +### Configuration > Tools (`tools.html`) + +- [x] **LSP toolset** (`type: lsp`) — Language Server Protocol integration. *(Added with link to dedicated page)* +- [x] **User Prompt toolset** (`type: user_prompt`) — User input collection. *(Added with link to dedicated page)* +- [x] **API toolset** (`type: api`) — HTTP API tools. *(Added with link to dedicated page)* +- [x] **Handoff toolset** (`type: handoff`) — A2A agent delegation. *(Added)* +- [x] **A2A toolset** (`type: a2a`) — Toolset for connecting to remote A2A agents with `name` and `url`. *(Added)* +- [x] **Shared todo** (`shared: true`) — Todo toolset option for sharing todos across agents. *(Added)* +- [x] **Filesystem `post_edit`** — Post-edit commands that run after file edits (e.g., auto-format). *(Added)* +- [x] **Filesystem `ignore_vcs`** — Option to ignore VCS (.gitignore) files. *(Added)* +- [x] **Shell `env`** — Environment variables for shell/script/mcp/lsp tools. *(Added)* +- [x] **Fetch `timeout`** — Fetch tool timeout configuration. *(Added)* +- [x] **Script tool format** — Updated to show the correct `shell` map format with args, required, env, working_dir. *(Fixed)* +- [x] **MCP `config`** — The `config` field on MCP toolsets. *(Added)* + +### Configuration > Models (`models.html`) + +- [x] `track_usage` — Model property to track token usage. *(Added)* +- [x] `token_key` — Model property for specifying the env var holding the API token. *(Added)* +- [x] `routing` — Model property for rule-based routing. *(Added with link to routing page)* +- [x] `base_url` — Added examples showing how to use it with custom/self-hosted endpoints. *(Added)* + +### Configuration > Overview (`overview.html`) + +- [x] `metadata` — Top-level config section (author, license, description, readme, version). *(Added)* +- [x] `permissions` — Top-level config section. *(Added link to permissions page)* +- [x] `providers` — Top-level section. *(Added full documentation)* +- [x] Config `version` field — Current version is "5" but not documented what it means or how migration works. *(Added)* +- [x] **Advanced configuration cards** — Added cards linking to Hooks, Permissions, Sandbox, and Structured Output pages. *(Done)* + +### Features > CLI (`cli.html`) + +- [x] `--prompt-file` flag — Explanation of how it works (includes file contents as system context). *(Added)* +- [x] `--session` with relative references — e.g., `-1` for last session, `-2` for second to last. *(Added)* +- [x] Multi-turn conversations in `cagent exec` — Added example. *(Added)* +- [x] Queueing multiple messages: `cagent run question1 question2 ...` *(Added)* +- [x] `cagent eval` flags — Added examples with flags. *(Added)* +- [x] `cagent build` command — *(Added)* +- [ ] `--exit-on-stdin-eof` flag — Hidden flag, low priority. +- [ ] `--keep-containers` flag for eval — Already documented in eval page. + +### Features > TUI (`tui.html`) + +- [x] Ctrl+R reverse history search — *(Added with dedicated section)* +- [x] `/title` command for renaming sessions — *(Already documented)* +- [x] `/think` command to toggle thinking at runtime — *(Already documented)* +- [x] Custom themes and hot-reloading — *(Already documented)* +- [x] Ctrl+Z to suspend TUI — *(Added)* +- [x] Ctrl+L audio listening shortcut — *(Added)* +- [x] Ctrl+X to clear queued messages — *(Added)* +- [x] Permissions view dialog — *(Mentioned)* +- [x] Model picker / switching during session — *(Already documented)* +- [ ] Branching sessions (edit previous messages) — Mentioned but could have more detail. +- [ ] Double-click title to edit — Minor feature. + +### Features > Skills (`skills.html`) + +- [x] Skill invocation via slash commands — *(Added)* +- [x] Recursive `~/.agents/skills` directory support — *(Clarified in table)* + +### Features > Evaluation (`evaluation.html`) + +- [x] `--keep-containers` flag — *(Already documented)* +- [x] Session database produced for investigation — *(Added note)* +- [x] Debugging tip for failed evals — *(Added callout)* + +### Providers + +- [x] **Mistral** — Listed as built-in alias but has no dedicated page or usage examples. *(Completed: pages/providers/mistral.html)* +- [x] **xAI (Grok)** — *(Completed: pages/providers/xai.html)* +- [x] **Nebius** — *(Completed: pages/providers/nebius.html)* +- [x] **Ollama** — Can be used via custom providers. *(Completed: pages/providers/local.html - covers Ollama, vLLM, LocalAI)* + +### Features > RAG (`rag.html`) + +- [x] `respect_vcs` option — *(Added with default value)* +- [x] `return_full_content` results option — *(Added)* +- [ ] Code-aware chunking (`code_aware: true`) with tree-sitter — Partially documented, the option is shown in examples. + +### Community > Troubleshooting + +- [x] Common errors: context window exceeded, max iterations reached, model fallback behavior. *(Added)* +- [x] Debugging with `--debug` and `--log-file` — *(Already documented)* + +## Tips & Best Practices + +- [x] *(Completed: pages/guides/tips.html)* - Comprehensive tips page covering: + - [x] **Tip: Using `--yolo` mode** — Auto-approve all tool calls. Security implications and when it's appropriate. + - [x] **Tip: Environment variable interpolation in commands** — Commands support `${env.VAR}` and `${env.VAR || 'default'}` JavaScript template syntax. + - [x] **Tip: Fallback model strategy** — Best practices for choosing fallback models. + - [x] **Tip: Deferred tools for performance** — Use `defer: true` to load tools only when needed. + - [x] **Tip: Combining handoffs and sub_agents** — Explain the difference. + - [x] **Tip: Using the `auto` model** — The special `auto` model value for automatic model selection. + - [x] **Tip: Model aliases and pinning** — cagent automatically resolves model aliases to pinned versions. + - [x] **Tip: User-defined default model** — Users can define their own default model in global configuration. *(Added)* + - [x] **Tip: Usage on Github** - Example of the PR reviewer. *(Added)* + +## Navigation Updates + +- [x] Added Model Routing to Configuration section +- [x] Added xAI (Grok) and Nebius to Model Providers section +- [x] Added Guides section with Tips & Best Practices and Go SDK + +## Remaining Low-Priority Items + +- [ ] Branching sessions — More detailed documentation (currently mentioned) +- [ ] Double-click title to edit in TUI — Minor feature +- [ ] `--exit-on-stdin-eof` flag — Hidden flag for integration +- [ ] Code-aware chunking detail — Already shown in examples diff --git a/docs/js/app.js b/docs/js/app.js index c74cf22cb..9e14188ac 100644 --- a/docs/js/app.js +++ b/docs/js/app.js @@ -29,6 +29,19 @@ const NAV = [ { title: 'Agent Config', page: 'configuration/agents' }, { title: 'Model Config', page: 'configuration/models' }, { title: 'Tool Config', page: 'configuration/tools' }, + { title: 'Hooks', page: 'configuration/hooks' }, + { title: 'Permissions', page: 'configuration/permissions' }, + { title: 'Sandbox Mode', page: 'configuration/sandbox' }, + { title: 'Structured Output', page: 'configuration/structured-output' }, + { title: 'Model Routing', page: 'configuration/routing' }, + ], + }, + { + heading: 'Built-in Tools', + items: [ + { title: 'LSP Tool', page: 'tools/lsp' }, + { title: 'User Prompt Tool', page: 'tools/user-prompt' }, + { title: 'API Tool', page: 'tools/api' }, ], }, { @@ -55,9 +68,20 @@ const NAV = [ { title: 'Google Gemini', page: 'providers/google' }, { title: 'AWS Bedrock', page: 'providers/bedrock' }, { title: 'Docker Model Runner', page: 'providers/dmr' }, + { title: 'Mistral', page: 'providers/mistral' }, + { title: 'xAI (Grok)', page: 'providers/xai' }, + { title: 'Nebius', page: 'providers/nebius' }, + { title: 'Local Models', page: 'providers/local' }, { title: 'Custom Providers', page: 'providers/custom' }, ], }, + { + heading: 'Guides', + items: [ + { title: 'Tips & Best Practices', page: 'guides/tips' }, + { title: 'Go SDK', page: 'guides/go-sdk' }, + ], + }, { heading: 'Community', items: [ diff --git a/docs/pages/community/troubleshooting.html b/docs/pages/community/troubleshooting.html index 5c89eb10a..ef6e3d51f 100644 --- a/docs/pages/community/troubleshooting.html +++ b/docs/pages/community/troubleshooting.html @@ -1,6 +1,49 @@

Troubleshooting

Common issues and how to resolve them when working with cagent.

+

Common Errors

+ +

Context Window Exceeded

+ +

Error message: context_length_exceeded or similar.

+ + + +

Max Iterations Reached

+ +

The agent hit its max_iterations limit without completing the task.

+ + + +

Model Fallback Triggered

+ +

When the primary model fails, cagent automatically switches to fallback models. Look for log messages like "Switching to fallback model".

+ + + +

Configure fallback behavior in your agent config:

+ +
agents:
+  root:
+    model: anthropic/claude-sonnet-4-0
+    fallback:
+      models: [openai/gpt-4o, openai/gpt-4o-mini]
+      retries: 2      # retries per model for 5xx errors
+      cooldown: 1m    # how long to stick with fallback after 429
+

Debug Mode

The first step for any issue is enabling debug logging. This provides detailed information about what cagent is doing internally.

diff --git a/docs/pages/configuration/agents.html b/docs/pages/configuration/agents.html index 449703362..11984a7f1 100644 --- a/docs/pages/configuration/agents.html +++ b/docs/pages/configuration/agents.html @@ -8,20 +8,40 @@

Full Schema

model: string # Required: model reference description: string # Required: what this agent does instruction: string # Required: system prompt - sub_agents: [list] # Optional: sub-agent names + sub_agents: [list] # Optional: sub-agent names toolsets: [list] # Optional: tool configurations rag: [list] # Optional: RAG source references - fallback: # Optional: fallback config + fallback: # Optional: fallback config models: [list] retries: 2 cooldown: 1m add_date: boolean # Optional: add date to context add_environment_info: boolean # Optional: add env info to context + add_prompt_files: [list] # Optional: include additional prompt files + add_description_parameter: bool # Optional: add description to tool schema + code_mode_tools: boolean # Optional: enable code mode tool format max_iterations: int # Optional: max tool-calling loops num_history_items: int # Optional: limit conversation history - skills: boolean # Optional: enable skill discovery - commands: # Optional: named prompts - name: "prompt text" + skills: boolean # Optional: enable skill discovery + commands: # Optional: named prompts + name: "prompt text" + welcome_message: string # Optional: message shown at session start + handoffs: [list] # Optional: list of A2A handoff agents + defer: [list] or true # Optional: tools to load on demand + hooks: # Optional: lifecycle hooks + pre_tool_use: [list] + post_tool_use: [list] + session_start: [list] + session_end: [list] + permissions: # Optional: tool execution control + allow: [list] + deny: [list] + sandbox: # Optional: shell isolation + image: string + paths: [list] + structured_output: # Optional: constrain output format + name: string + schema: object
💡 See also
@@ -67,6 +87,18 @@

Properties Reference

add_environment_infoboolean✗ When true, injects working directory, OS, CPU architecture, and git info into context. + + add_prompt_filesarray✗ + List of file paths whose contents are appended to the system prompt. Useful for including coding standards, guidelines, or additional context. + + + add_description_parameterboolean✗ + When true, adds agent descriptions as a parameter in tool schemas. Helps with tool selection in multi-agent scenarios. + + + code_mode_toolsboolean✗ + When true, formats tool responses in a code-optimized format with structured output schemas. Useful for MCP gateway and programmatic access. + max_iterationsint✗ Maximum number of tool-calling loops. Default: unlimited (0). Set this to prevent infinite loops. @@ -87,6 +119,34 @@

Properties Reference

commandsobject✗ Named prompts that can be run with cagent run config.yaml /command_name. + + welcome_messagestring✗ + Message displayed to the user when a session starts. Useful for providing context or instructions. + + + handoffsarray✗ + List of A2A agent configurations this agent can delegate to. See A2A Protocol. + + + deferarray/boolean✗ + Tools to load on-demand rather than at startup. Set to true to defer all tools, or provide a list of tool names. + + + hooksobject✗ + Lifecycle hooks for running commands at various points. See Hooks. + + + permissionsobject✗ + Control which tools are auto-approved, require confirmation, or are blocked. See Permissions. + + + sandboxobject✗ + Run shell commands in an isolated Docker container. See Sandbox Mode. + + + structured_outputobject✗ + Constrain agent output to match a JSON schema. See Structured Output. + @@ -95,6 +155,52 @@

Properties Reference

Default is 0 (unlimited). Always set max_iterations for agents with powerful tools like shell to prevent infinite loops. A value of 20–50 is typical for development agents.

+

Welcome Message

+ +

Display a message when users start a session:

+ +
agents:
+  assistant:
+    model: openai/gpt-4o
+    description: Development assistant
+    instruction: You are a helpful coding assistant.
+    welcome_message: |
+      👋 Welcome! I'm your development assistant.
+      
+      I can help you with:
+      - Writing and reviewing code
+      - Running tests and debugging
+      - Explaining concepts
+      
+      What would you like to work on?
+ +

Deferred Tool Loading

+ +

Load tools on-demand to speed up agent startup:

+ +
agents:
+  root:
+    model: anthropic/claude-sonnet-4-0
+    description: Multi-purpose assistant
+    instruction: You have access to many tools.
+    toolsets:
+      - type: mcp
+        ref: docker:github-official
+      - type: mcp
+        ref: docker:slack
+      - type: filesystem
+    # Defer all tools - load when first used
+    defer: true
+ +

Or defer specific tools:

+ +
agents:
+  root:
+    model: openai/gpt-4o
+    defer:
+      - "mcp:github:*"    # Defer all GitHub tools
+      - "mcp:slack:*"     # Defer all Slack tools
+

Fallback Configuration

Automatically switch to backup models when the primary fails:

@@ -160,6 +266,7 @@

Complete Example

instruction: | You are a technical lead. Analyze requests and delegate to the right specialist. Always review work before responding. + welcome_message: "👋 I'm your tech lead. How can I help today?" sub_agents: [developer, researcher] add_date: true add_environment_info: true @@ -169,6 +276,10 @@

Complete Example

- type: think commands: review: "Review all recent code changes for issues" + hooks: + session_start: + - type: command + command: "./scripts/setup.sh" developer: model: claude @@ -180,6 +291,17 @@

Complete Example

- type: shell - type: think - type: todo + permissions: + allow: + - "read_*" + - "shell:cmd=go*" + - "shell:cmd=npm*" + deny: + - "shell:cmd=sudo*" + sandbox: + image: golang:1.23-alpine + paths: + - "." researcher: model: openai/gpt-4o diff --git a/docs/pages/configuration/hooks.html b/docs/pages/configuration/hooks.html new file mode 100644 index 000000000..bfc1bef90 --- /dev/null +++ b/docs/pages/configuration/hooks.html @@ -0,0 +1,233 @@ +

Hooks

+

Run shell commands at various points during agent execution for deterministic control over behavior.

+ +

Overview

+ +

Hooks allow you to execute shell commands or scripts at key points in an agent's lifecycle. They provide deterministic control that works alongside the LLM's behavior, enabling validation, logging, environment setup, and more.

+ +
+
ℹ️ Use Cases
+ +
+ +

Hook Types

+ +

There are four hook event types:

+ + + + + + + + + +
EventWhen it firesCan block?
pre_tool_useBefore a tool call executesYes
post_tool_useAfter a tool completes successfullyNo
session_startWhen a session begins or resumesNo
session_endWhen a session terminatesNo
+ +

Configuration

+ +
agents:
+  root:
+    model: openai/gpt-4o
+    description: An agent with hooks
+    instruction: You are a helpful assistant.
+    hooks:
+      # Run before specific tools
+      pre_tool_use:
+        - matcher: "shell|edit_file"
+          hooks:
+            - type: command
+              command: "./scripts/validate-command.sh"
+              timeout: 30
+      
+      # Run after all tool calls
+      post_tool_use:
+        - matcher: "*"
+          hooks:
+            - type: command
+              command: "./scripts/log-tool-call.sh"
+      
+      # Run when session starts
+      session_start:
+        - type: command
+          command: "./scripts/setup-env.sh"
+      
+      # Run when session ends
+      session_end:
+        - type: command
+          command: "./scripts/cleanup.sh"
+ +

Matcher Patterns

+ +

The matcher field uses regex patterns to match tool names:

+ + + + + + + + + +
PatternMatches
*All tools
shellOnly the shell tool
shell|edit_fileEither shell or edit_file
mcp:.*All MCP tools (regex)
+ +

Hook Input

+ +

Hooks receive JSON input via stdin with context about the event:

+ +
{
+  "session_id": "abc123",
+  "cwd": "/path/to/project",
+  "hook_event_name": "pre_tool_use",
+  "tool_name": "shell",
+  "tool_use_id": "call_xyz",
+  "tool_input": {
+    "cmd": "rm -rf /tmp/cache",
+    "cwd": "."
+  }
+}
+ +

Input Fields by Event Type

+ + + + + + + + + + + + + + +
Fieldpre_tool_usepost_tool_usesession_startsession_end
session_id
cwd
hook_event_name
tool_name
tool_use_id
tool_input
tool_response
source
reason
+ +

The source field for session_start can be: startup, resume, clear, or compact.

+

The reason field for session_end can be: clear, logout, prompt_input_exit, or other.

+ +

Hook Output

+ +

Hooks communicate back via JSON output to stdout:

+ +
{
+  "continue": true,
+  "stop_reason": "Optional message when continue=false",
+  "suppress_output": false,
+  "system_message": "Warning message to show user",
+  "decision": "allow",
+  "reason": "Explanation for the decision",
+  "hook_specific_output": {
+    "hook_event_name": "pre_tool_use",
+    "permission_decision": "allow",
+    "permission_decision_reason": "Command is safe",
+    "updated_input": { "cmd": "modified command" }
+  }
+}
+ +

Output Fields

+ + + + + + + + + + + +
FieldTypeDescription
continuebooleanWhether to continue execution (default: true)
stop_reasonstringMessage to show when continue=false
suppress_outputbooleanHide stdout from transcript
system_messagestringWarning message to display to user
decisionstringFor blocking: block to prevent operation
reasonstringExplanation for the decision
+ +

Pre-Tool-Use Specific Output

+ +

The hook_specific_output for pre_tool_use supports:

+ + + + + + + + +
FieldTypeDescription
permission_decisionstringallow, deny, or ask
permission_decision_reasonstringExplanation for the decision
updated_inputobjectModified tool input (replaces original)
+ +

Exit Codes

+ +

Hook exit codes have special meaning:

+ + + + + + + + +
Exit CodeMeaning
0Success — continue normally
2Blocking error — stop the operation
OtherError — logged but execution continues
+ +

Example: Validation Script

+ +

A simple pre-tool-use hook that blocks dangerous shell commands:

+ +
#!/bin/bash
+# scripts/validate-command.sh
+
+# Read JSON input from stdin
+INPUT=$(cat)
+TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name')
+CMD=$(echo "$INPUT" | jq -r '.tool_input.cmd // empty')
+
+# Block dangerous commands
+if [[ "$TOOL_NAME" == "shell" ]]; then
+  if [[ "$CMD" =~ ^sudo ]] || [[ "$CMD" =~ rm.*-rf ]]; then
+    echo '{"decision": "block", "reason": "Dangerous command blocked by policy"}'
+    exit 2
+  fi
+fi
+
+# Allow everything else
+echo '{"decision": "allow"}'
+exit 0
+ +

Example: Audit Logging

+ +

A post-tool-use hook that logs all tool calls:

+ +
#!/bin/bash
+# scripts/log-tool-call.sh
+
+INPUT=$(cat)
+TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
+TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name')
+SESSION_ID=$(echo "$INPUT" | jq -r '.session_id')
+
+# Append to audit log
+echo "$TIMESTAMP | $SESSION_ID | $TOOL_NAME" >> ./audit.log
+
+# Don't block execution
+echo '{"continue": true}'
+exit 0
+ +

Timeout

+ +

Hooks have a default timeout of 60 seconds. You can customize this per hook:

+ +
hooks:
+  pre_tool_use:
+    - matcher: "*"
+      hooks:
+        - type: command
+          command: "./slow-validation.sh"
+          timeout: 120  # 2 minutes
+ +
+
⚠️ Performance
+

Hooks run synchronously and can slow down agent execution. Keep hook scripts fast and efficient. Consider using suppress_output: true for logging hooks to reduce noise.

+
diff --git a/docs/pages/configuration/models.html b/docs/pages/configuration/models.html index 7a6f4bf3f..259f45469 100644 --- a/docs/pages/configuration/models.html +++ b/docs/pages/configuration/models.html @@ -13,8 +13,11 @@

Full Schema

frequency_penalty: float # Optional: 0.0–2.0 presence_penalty: float # Optional: 0.0–2.0 base_url: string # Optional: custom API endpoint + token_key: string # Optional: env var for API token thinking_budget: string|int # Optional: reasoning effort parallel_tool_calls: boolean # Optional: allow parallel tool calls + track_usage: boolean # Optional: track token usage + routing: [list] # Optional: rule-based model routing provider_opts: # Optional: provider-specific options key: value @@ -30,9 +33,12 @@

Properties Reference

top_pfloat✗Nucleus sampling threshold frequency_penaltyfloat✗Penalize repeated tokens (0.0–2.0) presence_penaltyfloat✗Encourage topic diversity (0.0–2.0) - base_urlstring✗Custom API endpoint URL + base_urlstring✗Custom API endpoint URL (for self-hosted or proxied endpoints) + token_keystring✗Environment variable name containing the API token (overrides provider default) thinking_budgetstring/int✗Reasoning effort control parallel_tool_callsboolean✗Allow model to call multiple tools at once + track_usageboolean✗Track and report token usage for this model + routingarray✗Rule-based routing to different models. See Model Routing. provider_optsobject✗Provider-specific options (see provider pages) @@ -128,3 +134,30 @@

Examples by Provider

max_tokens: 8192

For detailed provider setup, see the Model Providers section.

+ +

Custom Endpoints

+ +

Use base_url to point to custom or self-hosted endpoints:

+ +
models:
+  # Azure OpenAI
+  azure_gpt:
+    provider: openai
+    model: gpt-4o
+    base_url: https://my-resource.openai.azure.com/openai/deployments/gpt-4o
+    token_key: AZURE_OPENAI_API_KEY
+
+  # Self-hosted vLLM
+  local_llama:
+    provider: openai  # vLLM is OpenAI-compatible
+    model: meta-llama/Llama-3.2-3B-Instruct
+    base_url: http://localhost:8000/v1
+
+  # Proxy or gateway
+  proxied:
+    provider: openai
+    model: gpt-4o
+    base_url: https://proxy.internal.company.com/openai/v1
+    token_key: INTERNAL_API_KEY
+ +

See Local Models for more examples of custom endpoints.

diff --git a/docs/pages/configuration/overview.html b/docs/pages/configuration/overview.html index 8263fc310..fb7607d77 100644 --- a/docs/pages/configuration/overview.html +++ b/docs/pages/configuration/overview.html @@ -3,16 +3,25 @@

Configuration Overview

File Structure

-

A cagent YAML config has three main sections:

+

A cagent YAML config has these main sections:

-
# 1. Models — define AI models with their parameters
+
# 1. Version — configuration schema version (optional but recommended)
+version: 5
+
+# 2. Metadata — optional agent metadata for distribution
+metadata:
+  author: my-org
+  description: My helpful agent
+  version: "1.0.0"
+
+# 3. Models — define AI models with their parameters
 models:
   claude:
     provider: anthropic
     model: claude-sonnet-4-0
     max_tokens: 64000
 
-# 2. Agents — define AI agents with their behavior
+# 4. Agents — define AI agents with their behavior
 agents:
   root:
     model: claude
@@ -21,12 +30,25 @@ 

File Structure

toolsets: - type: think -# 3. Providers — optional custom provider definitions +# 5. RAG — define retrieval-augmented generation sources (optional) +rag: + docs: + docs: ["./docs"] + strategies: + - type: chunked-embeddings + model: openai/text-embedding-3-small + +# 6. Providers — optional custom provider definitions providers: my_provider: api_type: openai_chatcompletions base_url: https://api.example.com/v1 - token_key: MY_API_KEY
+ token_key: MY_API_KEY + +# 7. Permissions — global tool permission rules (optional) +permissions: + allow: ["read_*"] + deny: ["shell:cmd=sudo*"]

Minimal Config

@@ -61,7 +83,7 @@

Config Sections

🤖

Agent Config

-

All agent properties: model, instruction, tools, sub-agents, and more.

+

All agent properties: model, instruction, tools, sub-agents, permissions, hooks, and more.

🧠
@@ -71,7 +93,32 @@

Model Config

🔧

Tool Config

-

Built-in tools, MCP tools, Docker MCP, and tool filtering.

+

Built-in tools, MCP tools, Docker MCP, LSP, API tools, and tool filtering.

+
+ + +

Advanced Configuration

+ +
+ +
+

Hooks

+

Run shell commands at lifecycle events like tool calls and session start/end.

+
+ +
🔐
+

Permissions

+

Control which tools auto-approve, require confirmation, or are blocked.

+
+ +
📦
+

Sandbox Mode

+

Run shell commands in an isolated Docker container for security.

+
+ +
📋
+

Structured Output

+

Constrain agent responses to match a specific JSON schema.

@@ -113,3 +160,76 @@

JSON Schema

For editor autocompletion and validation, use the cagent JSON Schema. Add this to the top of your YAML file:

# yaml-language-server: $schema=https://raw.githubusercontent.com/docker/cagent/main/cagent-schema.json
+ +

Config Versioning

+ +

cagent configs are versioned. The current version is 5. Add the version at the top of your config:

+ +
version: 5
+
+agents:
+  root:
+    model: openai/gpt-4o
+    # ...
+ +

When you load an older config, cagent automatically migrates it to the latest schema. It's recommended to include the version to ensure consistent behavior.

+ +

Metadata Section

+ +

Optional metadata for agent distribution via OCI registries:

+ +
metadata:
+  author: my-org
+  license: Apache-2.0
+  description: A helpful coding assistant
+  readme: |  # Displayed in registries
+    This agent helps with coding tasks.
+  version: "1.0.0"
+ + + + + + + + + + +
FieldDescription
authorAuthor or organization name
licenseLicense identifier (e.g., Apache-2.0, MIT)
descriptionShort description for the agent
readmeLonger markdown description
versionSemantic version string
+ +

See Agent Distribution for publishing agents to registries.

+ +

Custom Providers Section

+ +

Define reusable provider configurations for custom or self-hosted endpoints:

+ +
providers:
+  azure:
+    api_type: openai_chatcompletions
+    base_url: https://my-resource.openai.azure.com/openai/deployments/gpt-4o
+    token_key: AZURE_OPENAI_API_KEY
+  
+  internal_llm:
+    api_type: openai_chatcompletions
+    base_url: https://llm.internal.company.com/v1
+    token_key: INTERNAL_API_KEY
+
+models:
+  azure_gpt:
+    provider: azure  # References the custom provider
+    model: gpt-4o
+
+agents:
+  root:
+    model: azure_gpt
+ + + + + + + + +
FieldDescription
api_typeAPI schema: openai_chatcompletions (default) or openai_responses
base_urlBase URL for the API endpoint
token_keyEnvironment variable name for the API token
+ +

See Custom Providers for more details.

diff --git a/docs/pages/configuration/permissions.html b/docs/pages/configuration/permissions.html new file mode 100644 index 000000000..3715dd001 --- /dev/null +++ b/docs/pages/configuration/permissions.html @@ -0,0 +1,181 @@ +

Permissions

+

Control which tools can execute automatically, require confirmation, or are blocked entirely.

+ +

Overview

+ +

Permissions provide fine-grained control over tool execution. You can configure which tools are auto-approved (run without asking), which require user confirmation, and which are completely blocked.

+ +
+
ℹ️ Evaluation Order
+

Permissions are evaluated in this order: Deny → Allow → Ask. Deny patterns take priority, then allow patterns, and anything else defaults to asking for user confirmation.

+
+ +

Configuration

+ +
agents:
+  root:
+    model: openai/gpt-4o
+    description: Agent with permission controls
+    instruction: You are a helpful assistant.
+    permissions:
+      # Auto-approve these tools (no confirmation needed)
+      allow:
+        - "read_file"
+        - "read_*"           # Glob patterns
+        - "shell:cmd=ls*"    # With argument matching
+      
+      # Block these tools entirely
+      deny:
+        - "shell:cmd=sudo*"
+        - "shell:cmd=rm*-rf*"
+        - "dangerous_tool"
+ +

Pattern Syntax

+ +

Permissions support glob-style patterns with optional argument matching:

+ +

Simple Patterns

+ + + + + + + + + +
PatternMatches
shellExact match for shell tool
read_*Any tool starting with read_
mcp:github:*Any GitHub MCP tool
*All tools
+ +

Argument Matching

+ +

You can match tools based on their argument values using tool:arg=pattern syntax:

+ +
permissions:
+  allow:
+    # Allow shell only when cmd starts with "ls" or "cat"
+    - "shell:cmd=ls*"
+    - "shell:cmd=cat*"
+    
+    # Allow edit_file only in specific directory
+    - "edit_file:path=/home/user/safe/*"
+  
+  deny:
+    # Block shell with sudo
+    - "shell:cmd=sudo*"
+    
+    # Block writes to system directories
+    - "write_file:path=/etc/*"
+    - "write_file:path=/usr/*"
+ +

Multiple Argument Conditions

+ +

Chain multiple argument conditions with colons. All conditions must match:

+ +
permissions:
+  allow:
+    # Allow shell with ls in current directory
+    - "shell:cmd=ls*:cwd=."
+  
+  deny:
+    # Block shell with rm -rf anywhere
+    - "shell:cmd=rm*:cmd=*-rf*"
+ +

Glob Pattern Rules

+ +

Patterns follow filepath.Match semantics with some extensions:

+ + + +

Matching is case-insensitive.

+ +
+
💡 Trailing Wildcards
+

Trailing wildcards like sudo* match any characters including spaces, so sudo* matches sudo rm -rf /.

+
+ +

Decision Types

+ + + + + + + + +
DecisionBehavior
AllowTool executes immediately without user confirmation
AskUser must confirm before tool executes (default)
DenyTool is blocked and returns an error to the agent
+ +

Examples

+ +

Read-Only Agent

+ +

Allow all read operations, block all writes:

+ +
permissions:
+  allow:
+    - "read_file"
+    - "read_multiple_files"
+    - "list_directory"
+    - "directory_tree"
+    - "search_files_content"
+  deny:
+    - "write_file"
+    - "edit_file"
+    - "shell"
+ +

Safe Shell Agent

+ +

Allow specific safe commands, block dangerous ones:

+ +
permissions:
+  allow:
+    - "shell:cmd=ls*"
+    - "shell:cmd=cat*"
+    - "shell:cmd=grep*"
+    - "shell:cmd=find*"
+    - "shell:cmd=head*"
+    - "shell:cmd=tail*"
+    - "shell:cmd=wc*"
+  deny:
+    - "shell:cmd=sudo*"
+    - "shell:cmd=rm*"
+    - "shell:cmd=mv*"
+    - "shell:cmd=chmod*"
+    - "shell:cmd=chown*"
+ +

MCP Tool Permissions

+ +

Control MCP tools by their qualified names:

+ +
permissions:
+  allow:
+    # Allow all GitHub read operations
+    - "mcp:github:get_*"
+    - "mcp:github:list_*"
+    - "mcp:github:search_*"
+  deny:
+    # Block destructive GitHub operations
+    - "mcp:github:delete_*"
+    - "mcp:github:close_*"
+ +

Combining with Hooks

+ +

Permissions work alongside hooks. The evaluation order is:

+ +
    +
  1. Check deny patterns — if matched, tool is blocked
  2. +
  3. Check allow patterns — if matched, tool is auto-approved
  4. +
  5. Run pre_tool_use hooks — hooks can allow, deny, or ask
  6. +
  7. If no decision, ask user for confirmation
  8. +
+ +

Hooks can override allow decisions but cannot override deny decisions.

+ +
+
⚠️ Security Note
+

Permissions are enforced client-side. They help prevent accidental operations but should not be relied upon as a security boundary for untrusted agents. For stronger isolation, use sandbox mode.

+
diff --git a/docs/pages/configuration/routing.html b/docs/pages/configuration/routing.html new file mode 100644 index 000000000..8dbb9f6aa --- /dev/null +++ b/docs/pages/configuration/routing.html @@ -0,0 +1,160 @@ +

Model Routing

+

Route requests to different models based on the content of user messages.

+ +

Overview

+ +

Model routing lets you define a "router" model that automatically selects the best underlying model based on the user's message. This is useful for cost optimization, specialized handling, or load balancing across models.

+ +
+
ℹ️ How It Works
+

cagent uses NLP-based text similarity (via Bleve full-text search) to match user messages against example phrases you define. The route with the best-matching examples wins, and that model handles the request.

+
+ +

Configuration

+ +

Add routing rules to any model definition. The model's provider/model fields become the fallback when no route matches:

+ +
models:
+  smart_router:
+    # Fallback model when no routing rule matches
+    provider: openai
+    model: gpt-4o-mini
+    
+    # Routing rules
+    routing:
+      - model: anthropic/claude-sonnet-4-0
+        examples:
+          - "Write a detailed technical document"
+          - "Help me architect this system"
+          - "Review this code for security issues"
+          - "Explain this complex algorithm"
+      
+      - model: openai/gpt-4o
+        examples:
+          - "Generate some creative ideas"
+          - "Write a story about"
+          - "Help me brainstorm"
+          - "Come up with names for"
+      
+      - model: openai/gpt-4o-mini
+        examples:
+          - "What time is it"
+          - "Convert this to JSON"
+          - "Simple math calculation"
+          - "Translate this word"
+
+agents:
+  root:
+    model: smart_router
+    description: Assistant with intelligent model routing
+    instruction: You are a helpful assistant.
+ +

Routing Rules

+ +

Each routing rule has:

+ + + + + + + +
FieldTypeRequiredDescription
modelstringTarget model (inline format or reference to models section)
examplesarrayExample phrases that should route to this model
+ +

Matching Behavior

+ +

The router:

+ +
    +
  1. Extracts the last user message from the conversation
  2. +
  3. Searches all examples using full-text search
  4. +
  5. Aggregates match scores by route (best score per route wins)
  6. +
  7. Selects the route with the highest overall score
  8. +
  9. Falls back to the base model if no good match is found
  10. +
+ +
+
💡 Writing Good Examples
+ +
+ +

Use Cases

+ +

Cost Optimization

+ +

Route simple queries to cheaper models:

+ +
models:
+  cost_optimizer:
+    provider: openai
+    model: gpt-4o-mini  # Cheap fallback
+    routing:
+      - model: anthropic/claude-sonnet-4-0
+        examples:
+          - "Complex analysis"
+          - "Detailed research"
+          - "Multi-step reasoning"
+ +

Specialized Models

+ +

Route coding tasks to code-specialized models:

+ +
models:
+  task_router:
+    provider: openai
+    model: gpt-4o  # General fallback
+    routing:
+      - model: anthropic/claude-sonnet-4-0
+        examples:
+          - "Write code"
+          - "Debug this function"
+          - "Review my implementation"
+          - "Fix this bug"
+      - model: openai/gpt-4o
+        examples:
+          - "Write a blog post"
+          - "Help me with writing"
+          - "Summarize this document"
+ +

Load Balancing

+ +

Distribute load across equivalent models from different providers:

+ +
models:
+  load_balancer:
+    provider: openai
+    model: gpt-4o
+    routing:
+      - model: anthropic/claude-sonnet-4-0
+        examples:
+          - "First request pattern"
+          - "Another request type"
+      - model: google/gemini-2.5-flash
+        examples:
+          - "Different request pattern"
+          - "Alternative query style"
+ +

Debugging

+ +

Enable debug logging to see routing decisions:

+ +
$ cagent run config.yaml --debug
+ +

Look for log entries like:

+ +
"Rule-based router selected model" router=smart_router selected_model=anthropic/claude-sonnet-4-0
+"Route matched" model=anthropic/claude-sonnet-4-0 score=2.45
+ +
+
⚠️ Limitations
+ +
diff --git a/docs/pages/configuration/sandbox.html b/docs/pages/configuration/sandbox.html new file mode 100644 index 000000000..cfbf55f6d --- /dev/null +++ b/docs/pages/configuration/sandbox.html @@ -0,0 +1,152 @@ +

Sandbox Mode

+

Run shell commands in an isolated Docker container for enhanced security.

+ +

Overview

+ +

Sandbox mode runs shell tool commands inside a Docker container instead of directly on the host system. This provides an additional layer of isolation, limiting the potential impact of unintended or malicious commands.

+ +
+
ℹ️ Requirements
+

Sandbox mode requires Docker to be installed and running on the host system.

+
+ +

Configuration

+ +
agents:
+  root:
+    model: openai/gpt-4o
+    description: Agent with sandboxed shell
+    instruction: You are a helpful assistant.
+    toolsets:
+      - type: shell
+    sandbox:
+      image: alpine:latest    # Docker image to use
+      paths:                  # Directories to mount
+        - "."                 # Current directory (read-write)
+        - "/data:ro"          # Read-only mount
+ +

Properties

+ + + + + + + +
PropertyTypeDefaultDescription
imagestringalpine:latestDocker image to use for the sandbox container
pathsarray[]Host paths to mount into the container
+ +

Path Mounting

+ +

Paths can be specified with optional access modes:

+ + + + + + + + + + +
FormatDescription
/pathMount with read-write access (default)
/path:rwExplicitly read-write
/path:roRead-only mount
.Current working directory
./relativeRelative path (resolved from working directory)
+ +

Paths are mounted at the same location inside the container as on the host, so file paths in commands work the same way.

+ +

Example: Development Agent

+ +
agents:
+  developer:
+    model: anthropic/claude-sonnet-4-0
+    description: Development agent with sandboxed shell
+    instruction: |
+      You are a software developer. Use the shell tool to run
+      build commands and tests. Your shell runs in a sandbox.
+    toolsets:
+      - type: shell
+      - type: filesystem
+    sandbox:
+      image: node:20-alpine    # Node.js environment
+      paths:
+        - "."                  # Project directory
+        - "/tmp:rw"            # Temp directory for builds
+ +

How It Works

+ +
    +
  1. When the agent first uses the shell tool, cagent starts a Docker container
  2. +
  3. The container runs with the specified image and mounted paths
  4. +
  5. Shell commands execute inside the container via docker exec
  6. +
  7. The container persists for the session (commands share state)
  8. +
  9. When the session ends, the container is automatically stopped and removed
  10. +
+ +

Container Configuration

+ +

Sandbox containers are started with these Docker options:

+ + + +

Orphan Container Cleanup

+ +

If cagent crashes or is killed, sandbox containers may be left running. cagent automatically cleans up orphaned containers from previous runs when it starts. Containers are identified by labels and the PID of the cagent process that created them.

+ +

Choosing an Image

+ +

Select a Docker image that has the tools your agent needs:

+ + + + + + + + + + +
Use CaseSuggested Image
General scriptingalpine:latest
Node.js developmentnode:20-alpine
Python developmentpython:3.12-alpine
Go developmentgolang:1.23-alpine
Full Linux environmentubuntu:24.04
+ +
+
💡 Custom Images
+

For complex setups, build a custom Docker image with all required tools pre-installed. This avoids installation time during agent execution.

+
+ +
+
⚠️ Limitations
+ +
+ +

Combining with Permissions

+ +

For defense in depth, combine sandbox mode with permissions:

+ +
agents:
+  root:
+    model: openai/gpt-4o
+    description: Secure development agent
+    instruction: You are a helpful assistant.
+    toolsets:
+      - type: shell
+      - type: filesystem
+    sandbox:
+      image: node:20-alpine
+      paths:
+        - ".:rw"
+    permissions:
+      allow:
+        - "shell:cmd=npm*"
+        - "shell:cmd=node*"
+        - "shell:cmd=ls*"
+      deny:
+        - "shell:cmd=sudo*"
+        - "shell:cmd=curl*"
+        - "shell:cmd=wget*"
diff --git a/docs/pages/configuration/structured-output.html b/docs/pages/configuration/structured-output.html new file mode 100644 index 000000000..0445ed56a --- /dev/null +++ b/docs/pages/configuration/structured-output.html @@ -0,0 +1,213 @@ +

Structured Output

+

Force the agent to respond with JSON matching a specific schema.

+ +

Overview

+ +

Structured output constrains the agent's responses to match a predefined JSON schema. This is useful for building agents that need to produce machine-readable output for downstream processing, API responses, or integration with other systems.

+ +
+
ℹ️ When to Use
+ +
+ +

Configuration

+ +
agents:
+  analyzer:
+    model: openai/gpt-4o
+    description: Code analyzer that outputs structured results
+    instruction: |
+      Analyze the provided code and identify issues.
+      Return your findings in the structured format.
+    structured_output:
+      name: analysis_result
+      description: Code analysis findings
+      strict: true
+      schema:
+        type: object
+        properties:
+          issues:
+            type: array
+            items:
+              type: object
+              properties:
+                severity:
+                  type: string
+                  enum: ["error", "warning", "info"]
+                line:
+                  type: integer
+                message:
+                  type: string
+              required: ["severity", "line", "message"]
+          summary:
+            type: string
+        required: ["issues", "summary"]
+ +

Properties

+ + + + + + + + + +
PropertyTypeRequiredDescription
namestringName identifier for the output schema
descriptionstringDescription of what the output represents
strictbooleanEnforce strict schema validation (default: false)
schemaobjectJSON Schema defining the output structure
+ +

Schema Format

+ +

The schema follows JSON Schema specification. Common schema types:

+ +

Simple Object

+ +
schema:
+  type: object
+  properties:
+    name:
+      type: string
+    count:
+      type: integer
+    active:
+      type: boolean
+  required: ["name", "count"]
+ +

Array of Objects

+ +
schema:
+  type: object
+  properties:
+    items:
+      type: array
+      items:
+        type: object
+        properties:
+          id:
+            type: string
+          value:
+            type: number
+        required: ["id", "value"]
+  required: ["items"]
+ +

Enum Values

+ +
schema:
+  type: object
+  properties:
+    status:
+      type: string
+      enum: ["pending", "approved", "rejected"]
+    priority:
+      type: string
+      enum: ["low", "medium", "high", "critical"]
+  required: ["status"]
+ +

Strict Mode

+ +

When strict: true, the model is constrained to only produce output that exactly matches the schema. This provides stronger guarantees but may limit the model's flexibility.

+ +
+
+

strict: false (default)

+

Model aims to match schema but may include additional fields or slight variations.

+
+
+

strict: true

+

Model output is constrained to exactly match the schema. Stronger guarantees.

+
+
+ +

Provider Support

+ +

Structured output support varies by provider:

+ + + + + + + + + + +
ProviderSupportNotes
OpenAI✓ FullNative JSON mode with schema validation
Anthropic✓ FullTool-based structured output
Google Gemini✓ FullNative JSON mode
AWS Bedrock✓ PartialDepends on underlying model
DMR⚠️ LimitedDepends on model capabilities
+ +

Example: Data Extraction Agent

+ +
agents:
+  extractor:
+    model: openai/gpt-4o
+    description: Extract structured data from text
+    instruction: |
+      Extract contact information from the provided text.
+      Return all found contacts in the structured format.
+    structured_output:
+      name: contacts
+      description: Extracted contact information
+      strict: true
+      schema:
+        type: object
+        properties:
+          contacts:
+            type: array
+            items:
+              type: object
+              properties:
+                name:
+                  type: string
+                  description: Full name of the contact
+                email:
+                  type: string
+                  description: Email address
+                phone:
+                  type: string
+                  description: Phone number
+                company:
+                  type: string
+                  description: Company or organization
+              required: ["name"]
+          total_found:
+            type: integer
+            description: Total number of contacts found
+        required: ["contacts", "total_found"]
+ +

Example: Classification Agent

+ +
agents:
+  classifier:
+    model: anthropic/claude-sonnet-4-0
+    description: Classify support tickets
+    instruction: |
+      Classify the support ticket into the appropriate category
+      and priority level based on its content.
+    structured_output:
+      name: ticket_classification
+      strict: true
+      schema:
+        type: object
+        properties:
+          category:
+            type: string
+            enum: ["billing", "technical", "account", "feature_request", "other"]
+          priority:
+            type: string
+            enum: ["low", "medium", "high", "urgent"]
+          confidence:
+            type: number
+            minimum: 0
+            maximum: 1
+            description: Confidence score between 0 and 1
+          reasoning:
+            type: string
+            description: Brief explanation for the classification
+        required: ["category", "priority", "confidence"]
+ +
+
⚠️ Tool Limitations
+

When using structured output, the agent typically cannot use tools since its response format is constrained to the schema. Design your agent workflow accordingly — structured output agents work best for single-turn analysis or extraction tasks.

+
diff --git a/docs/pages/configuration/tools.html b/docs/pages/configuration/tools.html index 3772c28e8..a760d5660 100644 --- a/docs/pages/configuration/tools.html +++ b/docs/pages/configuration/tools.html @@ -8,7 +8,11 @@

Built-in Tools

Filesystem

Read, write, list, search, and navigate files in the working directory.

toolsets:
-  - type: filesystem
+ - type: filesystem + ignore_vcs: false # Optional: ignore .gitignore files + post_edit: # Optional: run commands after file edits + - path: "*.go" + cmd: "gofmt -w ${file}" @@ -23,6 +27,16 @@

Filesystem

OperationDescription
+ + + + + + + + +
PropertyTypeDefaultDescription
ignore_vcsbooleanfalseWhen true, ignores .gitignore patterns and includes all files
post_editarray[]Commands to run after editing files matching a path pattern
post_edit[].pathstringGlob pattern for files (e.g., *.go, src/**/*.ts)
post_edit[].cmdstringCommand to run (use ${file} for the edited file path)
+
💡 Tip

The filesystem tool resolves paths relative to the working directory. Agents can also use absolute paths.

@@ -31,10 +45,21 @@

Filesystem

Shell

Execute arbitrary shell commands. Each call runs in a fresh, isolated shell session — no state persists between calls.

toolsets:
-  - type: shell
+ - type: shell + env: # Optional: environment variables + MY_VAR: "value" + PATH: "${PATH}:/custom/bin"

The agent has access to the full system shell and environment variables. Commands have a default 30-second timeout. Requires user confirmation unless --yolo is used.

+ + + + + + +
PropertyTypeDescription
envobjectEnvironment variables to set for all shell commands
sandboxobjectRun commands in a Docker container. See Sandbox Mode.
+

Think

Step-by-step reasoning scratchpad. The agent writes its thoughts without producing visible output — ideal for planning, decomposition, and decision-making.

toolsets:
@@ -45,7 +70,8 @@ 

Think

Todo

Task list management. Agents can create, update, and track tasks with status (pending, in-progress, completed).

toolsets:
-  - type: todo
+ - type: todo + shared: false # Optional: share todos across agents
@@ -57,6 +83,13 @@

Todo

OperationDescription
+ + + + + +
PropertyTypeDefaultDescription
sharedbooleanfalseWhen true, todos are shared across all agents in a multi-agent config
+

Memory

Persistent key-value storage backed by SQLite. Data survives across sessions, letting agents remember context, user preferences, and past decisions.

toolsets:
@@ -73,37 +106,152 @@ 

Memory

Fetch

Make HTTP requests to external APIs and web services.

toolsets:
-  - type: fetch
+ - type: fetch + timeout: 30 # Optional: request timeout in seconds

Supports GET, POST, PUT, DELETE, and other HTTP methods. The agent can set headers, send request bodies, and receive response data. Useful for calling REST APIs, reading web pages, and downloading content.

+ + + + + +
PropertyTypeDefaultDescription
timeoutint30Request timeout in seconds
+

Script

Define custom shell scripts as named tools. Unlike the generic shell tool, scripts are predefined and can be given descriptive names — ideal for exposing safe, well-scoped operations.

+ +

Simple format:

toolsets:
   - type: script
-    scripts:
-      - name: run_tests
+    shell:
+      run_tests:
+        cmd: task test
         description: Run the project test suite
-        command: task test
-      - name: lint
+      lint:
+        cmd: task lint
         description: Run the linter
-        command: task lint
-      - name: deploy_staging
-        description: Deploy to the staging environment
-        command: ./scripts/deploy.sh staging
+ deploy: + cmd: ./scripts/deploy.sh ${env} + description: Deploy to an environment + args: + env: + type: string + enum: [staging, production] + required: [env] - - - + + + + + +
PropertyTypeDescription
scripts[].namestringTool name the agent sees and calls
scripts[].descriptionstringDescription shown to the model for tool selection
scripts[].commandstringShell command to execute when the tool is called
shell.<name>.cmdstringShell command to execute (supports ${arg} interpolation)
shell.<name>.descriptionstringDescription shown to the model
shell.<name>.argsobjectParameter definitions (JSON Schema properties)
shell.<name>.requiredarrayRequired parameter names
shell.<name>.envobjectEnvironment variables for this script
shell.<name>.working_dirstringWorking directory for script execution

Transfer Task

The transfer_task tool is automatically available when an agent has sub_agents. Allows delegating tasks to sub-agents. No configuration needed — it's enabled implicitly.

+

LSP (Language Server Protocol)

+

Connect to language servers for code intelligence: go-to-definition, find references, diagnostics, and more.

+
toolsets:
+  - type: lsp
+    command: gopls
+    args: []
+    file_types: [".go"]
+ + + + + + + + + +
PropertyTypeDescription
commandstringLSP server executable command
argsarrayCommand-line arguments for the LSP server
envobjectEnvironment variables for the LSP process
file_typesarrayFile extensions this LSP handles
+ +

See LSP Tool for full documentation.

+ +

User Prompt

+

Ask users questions and collect interactive input during agent execution.

+
toolsets:
+  - type: user_prompt
+ +

The agent can use this tool to ask questions, present choices, or collect information from the user. Supports JSON Schema for structured input validation.

+ +

See User Prompt Tool for full documentation.

+ +

API

+

Create custom tools that call HTTP APIs without writing code.

+
toolsets:
+  - type: api
+    name: get_weather
+    method: GET
+    endpoint: "https://api.weather.example/v1/current?city=${city}"
+    instruction: Get current weather for a city
+    args:
+      city:
+        type: string
+        description: City name
+    required: ["city"]
+    headers:
+      Authorization: "Bearer ${env.WEATHER_API_KEY}"
+ + + + + + + + + + + +
PropertyTypeDescription
namestringTool name
methodstringHTTP method: GET or POST
endpointstringURL with ${param} interpolation
argsobjectParameter definitions
requiredarrayRequired parameter names
headersobjectHTTP headers (supports ${env.VAR})
+ +

See API Tool for full documentation.

+ +

Handoff

+

Delegate tasks to remote agents via the A2A (Agent-to-Agent) protocol.

+
toolsets:
+  - type: handoff
+    name: research_agent
+    description: Specialized research agent
+    url: "http://localhost:8080/a2a"
+    timeout: 5m
+ + + + + + + + + +
PropertyTypeDescription
namestringTool name for delegation
descriptionstringDescription for the agent
urlstringA2A server endpoint URL
timeoutstringRequest timeout (default: 5m)
+ +

See A2A Protocol for full documentation.

+ +

A2A (Agent-to-Agent)

+

Connect to remote agents via the A2A protocol. Similar to handoff but configured as a toolset.

+
toolsets:
+  - type: a2a
+    name: research_agent
+    url: "http://localhost:8080/a2a"
+ + + + + + + +
PropertyTypeDescription
namestringTool name for the remote agent
urlstringA2A server endpoint URL
+ +

See A2A Protocol for full documentation.

+

MCP Tools

Extend agents with external tools via the Model Context Protocol.

@@ -128,6 +276,7 @@

Docker MCP (Recommended)

refstringDocker MCP reference (docker:name) toolsarrayOptional: only expose these tools instructionstringCustom instructions injected into the agent's context + configanyMCP server-specific configuration (passed during initialization) @@ -220,6 +369,11 @@

Combined Example

- type: todo - type: memory path: ./dev.db + - type: user_prompt + # LSP for code intelligence + - type: lsp + command: gopls + file_types: [".go"] # Custom scripts - type: script scripts: @@ -229,6 +383,12 @@

Combined Example

- name: lint description: Run the linter command: task lint + # Custom API tool + - type: api + name: get_status + method: GET + endpoint: "https://api.example.com/status" + instruction: Check service health # Docker MCP tools - type: mcp ref: docker:github-official diff --git a/docs/pages/features/cli.html b/docs/pages/features/cli.html index 3b85ecc21..14130b21e 100644 --- a/docs/pages/features/cli.html +++ b/docs/pages/features/cli.html @@ -19,8 +19,8 @@

cagent run

-a, --agent <name>Run a specific agent from the config --yoloAuto-approve all tool calls --model <ref>Override model(s). Use provider/model for all agents, or agent=provider/model for specific agents. Comma-separate multiple overrides. - --session <id>Resume a previous session. Supports relative refs (-1, -2) - --prompt-file <path>Include file contents as system context + --session <id>Resume a previous session. Supports relative refs (-1 = last, -2 = second to last) + --prompt-file <path>Include file contents as additional system context (repeatable) -c <name>Run a named command from the YAML config -d, --debugEnable debug logging --log-file <path>Custom debug log location @@ -35,7 +35,11 @@

cagent run

$ cagent run agent.yaml --model anthropic/claude-sonnet-4-0 $ cagent run agent.yaml --model "dev=openai/gpt-4o,reviewer=anthropic/claude-sonnet-4-0" $ cagent run agent.yaml --session -1 # resume last session -$ cagent run agent.yaml -c df # run named command +$ cagent run agent.yaml -c df # run named command +$ cagent run agent.yaml --prompt-file ./context.md # include file as context + +# Queue multiple messages (processed in sequence) +$ cagent run agent.yaml "question 1" "question 2" "question 3"

cagent exec

Run an agent in non-interactive (headless) mode. No TUI — output goes to stdout.

@@ -116,7 +120,21 @@

cagent push / cagent pull

cagent eval

Run agent evaluations.

-
$ cagent eval eval-config.yaml
+
$ cagent eval eval-config.yaml
+
+# With flags
+$ cagent eval agent.yaml ./evals -c 8              # 8 concurrent evaluations
+$ cagent eval agent.yaml --keep-containers         # Keep containers for debugging
+$ cagent eval agent.yaml --only "auth*"            # Only run matching evals
+ +

cagent build

+

Build agent configuration into a distributable format.

+ +
$ cagent build [config] [flags]
+
+# Examples
+$ cagent build agent.yaml
+$ cagent build agent.yaml -o ./dist

cagent alias

Manage agent aliases for quick access.

diff --git a/docs/pages/features/evaluation.html b/docs/pages/features/evaluation.html index 0bb6985b8..bca8a52e6 100644 --- a/docs/pages/features/evaluation.html +++ b/docs/pages/features/evaluation.html @@ -195,10 +195,16 @@

Output

+
+
💡 Debugging Failed Evals
+

Use --keep-containers to preserve containers after evaluation. You can then inspect them with docker exec to understand why an eval failed. The session database (.db file) contains the full conversation history for each eval.

+
+
$ cagent eval demo.yaml ./evals
 
   ✓ Counting Files in Local Folder
diff --git a/docs/pages/features/rag.html b/docs/pages/features/rag.html
index 2b7cbc4ae..378d5aa8a 100644
--- a/docs/pages/features/rag.html
+++ b/docs/pages/features/rag.html
@@ -168,13 +168,13 @@ 

Configuration Reference

Top-Level RAG Fields

- + - - - - - + + + + +
FieldTypeDescription
FieldTypeDefaultDescription
docs[]stringDocument paths/directories (shared across strategies)
descriptionstringHuman-readable description of this RAG source
respect_vcsbooleanRespect .gitignore files (default: true)
strategies[]objectArray of retrieval strategy configurations
resultsobjectPost-processing: fusion, reranking, deduplication, final limit
docs[]stringDocument paths/directories (shared across strategies)
descriptionstringHuman-readable description of this RAG source
respect_vcsbooleantrueRespect .gitignore files when indexing documents
strategies[]objectArray of retrieval strategy configurations
resultsobjectPost-processing: fusion, reranking, deduplication, final limit
@@ -239,8 +239,10 @@

Results (Post-Processing)

fusion.strategystringrrfFusion method: rrf, weighted, or max fusion.kint60RRF rank constant - deduplicateboolfalseRemove duplicate results - limitint5Final number of results + deduplicatebooltrueRemove duplicate results + limitint15Final number of results + include_scoreboolfalseInclude relevance scores in results + return_full_contentboolfalseReturn full document content instead of just matched chunks reranking.modelstring—Reranking model reference reranking.top_kint(all)Only rerank top K results reranking.thresholdfloat0.5Minimum relevance score after reranking diff --git a/docs/pages/features/skills.html b/docs/pages/features/skills.html index f8204bd54..d2b39fce4 100644 --- a/docs/pages/features/skills.html +++ b/docs/pages/features/skills.html @@ -53,9 +53,9 @@

Global

- - - + + +
PathSearch Type
~/.codex/skills/Recursive
~/.claude/skills/Flat
~/.agents/skills/Recursive
~/.codex/skills/Recursive (searches all subdirectories)
~/.claude/skills/Flat (immediate children only)
~/.agents/skills/Recursive (searches all subdirectories)
@@ -68,6 +68,22 @@

Project (from git root to current directory)

+

Invoking Skills

+ +

Skills can be invoked in multiple ways:

+ +
    +
  • Automatic: The agent detects when your request matches a skill's description and loads it automatically
  • +
  • Explicit: Reference the skill name in your prompt: "Use the create-dockerfile skill to..."
  • +
  • Slash command: Use /{skill-name} to invoke a skill directly
  • +
+ +
# In the TUI, invoke skill directly:
+/create-dockerfile
+
+# Or mention it in your message:
+"Create a dockerfile for my Python app (use the create-dockerfile skill)"
+

Precedence

When multiple skills share the same name:

diff --git a/docs/pages/features/tui.html b/docs/pages/features/tui.html index ac234cc2a..efcc62063 100644 --- a/docs/pages/features/tui.html +++ b/docs/pages/features/tui.html @@ -120,13 +120,20 @@

Keyboard Shortcuts

Ctrl+KOpen command palette Ctrl+MSwitch model - Ctrl+RReverse history search - Ctrl+LAudio listening mode - Ctrl+ZSuspend to background + Ctrl+RReverse history search (search previous inputs) + Ctrl+LStart audio listening mode (voice input) + Ctrl+ZSuspend TUI to background (resume with fg) + Ctrl+XClear queued messages EscapeCancel current operation + EnterSend message (or newline with Shift+Enter) + Up/DownNavigate message history +

History Search

+ +

Press Ctrl+R to enter incremental history search mode. Start typing to filter through your previous inputs. Press Enter to select a match, or Escape to cancel.

+

Theming

Customize the TUI appearance with built-in or custom themes:

diff --git a/docs/pages/guides/go-sdk.html b/docs/pages/guides/go-sdk.html new file mode 100644 index 000000000..f9db9b769 --- /dev/null +++ b/docs/pages/guides/go-sdk.html @@ -0,0 +1,344 @@ +

Go SDK

+

Use cagent as a Go library to embed AI agents in your applications.

+ +

Overview

+ +

cagent can be used as a Go library, allowing you to build AI agents directly into your Go applications. This gives you full programmatic control over agent creation, tool integration, and execution.

+ +
+
ℹ️ Import Path
+
import "github.com/docker/cagent/pkg/..."
+
+ +

Core Packages

+ + + + + + + + + + + + + + +
PackagePurpose
pkg/agentAgent creation and configuration
pkg/runtimeAgent execution and event streaming
pkg/sessionConversation state management
pkg/teamMulti-agent team composition
pkg/toolsTool interface and utilities
pkg/tools/builtinBuilt-in tools (shell, filesystem, etc.)
pkg/model/provider/*Model provider clients
pkg/config/latestConfiguration types
pkg/environmentEnvironment and secrets
+ +

Basic Example

+ +

Create a simple agent and run it:

+ +
package main
+
+import (
+    "context"
+    "fmt"
+    "log"
+    "os/signal"
+    "syscall"
+
+    "github.com/docker/cagent/pkg/agent"
+    "github.com/docker/cagent/pkg/config/latest"
+    "github.com/docker/cagent/pkg/environment"
+    "github.com/docker/cagent/pkg/model/provider/openai"
+    "github.com/docker/cagent/pkg/runtime"
+    "github.com/docker/cagent/pkg/session"
+    "github.com/docker/cagent/pkg/team"
+)
+
+func main() {
+    ctx, cancel := signal.NotifyContext(context.Background(), 
+        syscall.SIGINT, syscall.SIGTERM)
+    defer cancel()
+
+    if err := run(ctx); err != nil {
+        log.Fatal(err)
+    }
+}
+
+func run(ctx context.Context) error {
+    // Create model provider
+    llm, err := openai.NewClient(
+        ctx,
+        &latest.ModelConfig{
+            Provider: "openai",
+            Model:    "gpt-4o",
+        },
+        environment.NewDefaultProvider(),
+    )
+    if err != nil {
+        return err
+    }
+
+    // Create agent
+    assistant := agent.New(
+        "root",
+        "You are a helpful assistant.",
+        agent.WithModel(llm),
+        agent.WithDescription("A helpful assistant"),
+    )
+
+    // Create team and runtime
+    t := team.New(team.WithAgents(assistant))
+    rt, err := runtime.New(t)
+    if err != nil {
+        return err
+    }
+
+    // Run with a user message
+    sess := session.New(
+        session.WithUserMessage("What is 2 + 2?"),
+    )
+    
+    messages, err := rt.Run(ctx, sess)
+    if err != nil {
+        return err
+    }
+
+    // Print the response
+    fmt.Println(messages[len(messages)-1].Message.Content)
+    return nil
+}
+ +

Custom Tools

+ +

Define custom tools for your agent:

+ +
package main
+
+import (
+    "context"
+    "encoding/json"
+    "fmt"
+
+    "github.com/docker/cagent/pkg/tools"
+)
+
+// Define the tool's input schema
+type AddNumbersArgs struct {
+    A int `json:"a"`
+    B int `json:"b"`
+}
+
+// Implement the tool handler
+func addNumbers(_ context.Context, toolCall tools.ToolCall) (*tools.ToolCallResult, error) {
+    var args AddNumbersArgs
+    if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &args); err != nil {
+        return nil, err
+    }
+
+    result := args.A + args.B
+    return tools.ResultSuccess(fmt.Sprintf("%d", result)), nil
+}
+
+func main() {
+    // Create the tool definition
+    addTool := tools.Tool{
+        Name:        "add",
+        Category:    "math",
+        Description: "Add two numbers together",
+        Parameters:  tools.MustSchemaFor[AddNumbersArgs](),
+        Handler:     addNumbers,
+    }
+
+    // Use with an agent
+    calculator := agent.New(
+        "root",
+        "You are a calculator. Use the add tool for arithmetic.",
+        agent.WithModel(llm),
+        agent.WithTools(addTool),
+    )
+    // ...
+}
+ +

Streaming Responses

+ +

Process events as they happen:

+ +
func runStreaming(ctx context.Context, rt *runtime.Runtime, sess *session.Session) error {
+    events := rt.RunStream(ctx, sess)
+    
+    for event := range events {
+        switch e := event.(type) {
+        case *runtime.StreamStartedEvent:
+            fmt.Println("Stream started")
+            
+        case *runtime.AgentChoiceEvent:
+            // Print response chunks as they arrive
+            fmt.Print(e.Content)
+            
+        case *runtime.ToolCallEvent:
+            fmt.Printf("\n[Tool call: %s]\n", e.ToolCall.Function.Name)
+            
+        case *runtime.ToolCallConfirmationEvent:
+            // Auto-approve tool calls
+            rt.Resume(ctx, runtime.ResumeRequest{
+                Type: runtime.ResumeTypeApproveSession,
+            })
+            
+        case *runtime.ToolCallResponseEvent:
+            fmt.Printf("[Tool response: %s]\n", e.Response)
+            
+        case *runtime.StreamStoppedEvent:
+            fmt.Println("\nStream stopped")
+            
+        case *runtime.ErrorEvent:
+            return fmt.Errorf("error: %s", e.Error)
+        }
+    }
+    
+    return nil
+}
+ +

Multi-Agent Teams

+ +

Create agents that delegate to sub-agents:

+ +
package main
+
+import (
+    "github.com/docker/cagent/pkg/agent"
+    "github.com/docker/cagent/pkg/team"
+    "github.com/docker/cagent/pkg/tools/builtin"
+)
+
+func createTeam(llm provider.Provider) *team.Team {
+    // Create a child agent
+    researcher := agent.New(
+        "researcher",
+        "You research topics thoroughly.",
+        agent.WithModel(llm),
+        agent.WithDescription("Research specialist"),
+    )
+
+    // Create root agent with sub-agents
+    coordinator := agent.New(
+        "root",
+        "You coordinate research tasks.",
+        agent.WithModel(llm),
+        agent.WithDescription("Team coordinator"),
+        agent.WithSubAgents(researcher),
+        agent.WithToolSets(builtin.NewTransferTaskTool()),
+    )
+
+    return team.New(team.WithAgents(coordinator, researcher))
+}
+ +

Built-in Tools

+ +

Use cagent's built-in tools:

+ +
import (
+    "github.com/docker/cagent/pkg/config"
+    "github.com/docker/cagent/pkg/tools/builtin"
+)
+
+func createAgentWithBuiltinTools(llm provider.Provider) *agent.Agent {
+    // Runtime config for tools that need it
+    rtConfig := &config.RuntimeConfig{
+        Config: config.Config{
+            WorkingDir: "/path/to/workdir",
+        },
+    }
+
+    return agent.New(
+        "root",
+        "You are a developer assistant.",
+        agent.WithModel(llm),
+        agent.WithToolSets(
+            // Shell tool for running commands
+            builtin.NewShellTool(os.Environ(), rtConfig, nil),
+            // Filesystem tools
+            builtin.NewFilesystemTool(rtConfig.Config.WorkingDir, nil),
+            // Think tool for reasoning
+            builtin.NewThinkTool(),
+            // Todo tool for task tracking
+            builtin.NewTodoTool(false), // false = not shared
+        ),
+    )
+}
+ +

Using Different Providers

+ +
import (
+    "github.com/docker/cagent/pkg/model/provider/anthropic"
+    "github.com/docker/cagent/pkg/model/provider/gemini"
+    "github.com/docker/cagent/pkg/model/provider/openai"
+)
+
+// OpenAI
+openaiClient, _ := openai.NewClient(ctx, &latest.ModelConfig{
+    Provider: "openai",
+    Model:    "gpt-4o",
+}, env)
+
+// Anthropic
+anthropicClient, _ := anthropic.NewClient(ctx, &latest.ModelConfig{
+    Provider: "anthropic",
+    Model:    "claude-sonnet-4-0",
+    MaxTokens: 64000,
+}, env)
+
+// Google Gemini
+geminiClient, _ := gemini.NewClient(ctx, &latest.ModelConfig{
+    Provider: "google",
+    Model:    "gemini-2.5-flash",
+}, env)
+ +

Session Options

+ +
import "github.com/docker/cagent/pkg/session"
+
+sess := session.New(
+    // Set a title for the session
+    session.WithTitle("Code Review Task"),
+    
+    // Add user message
+    session.WithUserMessage("Review this code for bugs"),
+    
+    // Limit iterations
+    session.WithMaxIterations(20),
+    
+    // Include a file attachment
+    session.WithUserMessage("main.go", codeContent),
+)
+ +

Error Handling

+ +
messages, err := rt.Run(ctx, sess)
+if err != nil {
+    if errors.Is(err, context.Canceled) {
+        // User cancelled
+        log.Println("Operation cancelled")
+        return nil
+    }
+    if errors.Is(err, context.DeadlineExceeded) {
+        // Timeout
+        log.Println("Operation timed out")
+        return nil
+    }
+    // Other error
+    return fmt.Errorf("runtime error: %w", err)
+}
+
+// Check for errors in the event stream
+for event := range rt.RunStream(ctx, sess) {
+    if errEvent, ok := event.(*runtime.ErrorEvent); ok {
+        return fmt.Errorf("stream error: %s", errEvent.Error)
+    }
+}
+ +

Complete Example

+ +

See the examples/golibrary directory for complete working examples:

+ +
    +
  • simple/ — Basic agent with no tools
  • +
  • tool/ — Custom tool implementation
  • +
  • stream/ — Streaming event handling
  • +
  • multi/ — Multi-agent with sub-agents
  • +
  • builtintool/ — Using built-in tools
  • +
diff --git a/docs/pages/guides/tips.html b/docs/pages/guides/tips.html new file mode 100644 index 000000000..7c0823575 --- /dev/null +++ b/docs/pages/guides/tips.html @@ -0,0 +1,349 @@ +

Tips & Best Practices

+

Expert guidance for building effective, efficient, and secure agents.

+ +

Configuration Tips

+ +

Auto Mode for Quick Start

+ +

Don't have a config file? cagent can automatically detect your available API keys and use an appropriate model:

+ +
# Automatically uses the best available provider
+cagent run
+
+# Provider priority: OpenAI → Anthropic → Google → Mistral → DMR
+ +

The special auto model value also works in configs:

+ +
agents:
+  root:
+    model: auto  # Uses best available provider
+    description: Adaptive assistant
+    instruction: You are a helpful assistant.
+ +

Environment Variable Interpolation

+ +

Commands support JavaScript template literal syntax for environment variables:

+ +
agents:
+  root:
+    model: openai/gpt-4o
+    description: Deployment assistant
+    instruction: You help with deployments.
+    commands:
+      # Simple variable
+      greet: "Hello ${env.USER}!"
+      
+      # With default value
+      deploy: "Deploy to ${env.ENV || 'staging'}"
+      
+      # Multiple variables
+      release: "Release ${env.PROJECT} v${env.VERSION || '1.0.0'}"
+ +

Model Aliases Are Auto-Pinned

+ +

cagent automatically resolves model aliases to their latest pinned versions. This ensures reproducible behavior:

+ +
# You write:
+model: anthropic/claude-sonnet-4-5
+
+# cagent resolves to:
+# anthropic/claude-sonnet-4-5-20250929 (or latest available)
+ +

To use a specific version, specify it explicitly in your config.

+ +

Performance Tips

+ +

Defer Tools for Faster Startup

+ +

Large MCP toolsets can slow down agent startup. Use defer to load tools on-demand:

+ +
agents:
+  root:
+    model: openai/gpt-4o
+    description: Multi-tool assistant
+    instruction: You have many tools available.
+    toolsets:
+      - type: mcp
+        ref: docker:github-official
+      - type: mcp
+        ref: docker:slack
+      - type: mcp
+        ref: docker:linear
+    # Load all tools on first use
+    defer: true
+ +

Or defer specific tools:

+ +
defer:
+  - "mcp:github:*"    # Defer GitHub tools
+  - "mcp:slack:*"     # Defer Slack tools
+ +

Filter MCP Tools

+ +

Many MCP servers expose dozens of tools. Filter to only what you need:

+ +
toolsets:
+  - type: mcp
+    ref: docker:github-official
+    # Only expose these specific tools
+    tools:
+      - list_issues
+      - create_issue
+      - get_pull_request
+      - create_pull_request
+ +

Fewer tools means faster tool selection and less confusion for the model.

+ +

Set max_iterations

+ +

Always set max_iterations for agents with powerful tools to prevent infinite loops:

+ +
agents:
+  developer:
+    model: anthropic/claude-sonnet-4-0
+    description: Development assistant
+    instruction: You are a developer.
+    max_iterations: 30  # Reasonable limit for development tasks
+    toolsets:
+      - type: filesystem
+      - type: shell
+ +

Typical values: 20-30 for development agents, 10-15 for simple tasks.

+ +

Reliability Tips

+ +

Use Fallback Models

+ +

Configure fallback models for resilience against provider outages or rate limits:

+ +
agents:
+  root:
+    model: anthropic/claude-sonnet-4-0
+    description: Reliable assistant
+    instruction: You are a helpful assistant.
+    fallback:
+      models:
+        # Different provider for resilience
+        - openai/gpt-4o
+        # Cheaper model as last resort
+        - openai/gpt-4o-mini
+      retries: 2      # Retry 5xx errors twice
+      cooldown: 1m    # Stick with fallback for 1 min after rate limit
+ +

Best practices for fallback chains:

+
    +
  • Use different providers for true redundancy
  • +
  • Order by preference (best first)
  • +
  • Include a cheaper/faster model as last resort
  • +
+ +

Use Think Tool for Complex Tasks

+ +

The think tool dramatically improves reasoning quality with minimal overhead:

+ +
toolsets:
+  - type: think  # Always include for complex agents
+ +

The agent uses it as a scratchpad for planning and decision-making.

+ +

Security Tips

+ +

Use --yolo Mode Carefully

+ +

The --yolo flag auto-approves all tool calls without confirmation:

+ +
# Auto-approve everything (use with caution!)
+cagent run agent.yaml --yolo
+ +

When it's appropriate:

+
    +
  • CI/CD pipelines with controlled inputs
  • +
  • Automated testing
  • +
  • Agents with only safe, read-only tools
  • +
+ +

When to avoid:

+
    +
  • Interactive sessions with untested prompts
  • +
  • Agents with shell or filesystem write access
  • +
  • Any situation where unreviewed actions could cause harm
  • +
+ +

Combine Permissions with Sandbox

+ +

For defense in depth, use both permissions and sandbox mode:

+ +
agents:
+  secure_dev:
+    model: anthropic/claude-sonnet-4-0
+    description: Secure development assistant
+    instruction: You are a secure coding assistant.
+    toolsets:
+      - type: filesystem
+      - type: shell
+    # Layer 1: Permission controls
+    permissions:
+      allow:
+        - "read_*"
+        - "shell:cmd=go*"
+        - "shell:cmd=npm*"
+      deny:
+        - "shell:cmd=sudo*"
+        - "shell:cmd=rm*-rf*"
+    # Layer 2: Container isolation
+    sandbox:
+      image: golang:1.23-alpine
+      paths:
+        - ".:rw"
+ +

Use Hooks for Audit Logging

+ +

Log all tool calls for compliance or debugging:

+ +
agents:
+  audited:
+    model: openai/gpt-4o
+    description: Audited assistant
+    instruction: You are a helpful assistant.
+    hooks:
+      post_tool_use:
+        - matcher: "*"
+          hooks:
+            - type: command
+              command: "./scripts/audit-log.sh"
+ +

Multi-Agent Tips

+ +

Handoffs vs Sub-Agents

+ +

Understand the difference between sub_agents and handoffs:

+ +
+
+

sub_agents (transfer_task)

+

Delegates task to a child, waits for result, then continues. The parent remains in control.

+
sub_agents: [researcher, writer]
+
+
+

handoffs (A2A)

+

Transfers control entirely to another agent (possibly remote). One-way handoff.

+
handoffs:
+  - name: specialist
+    url: http://...
+
+
+ +

Give Sub-Agents Clear Descriptions

+ +

The root agent uses descriptions to decide which sub-agent to delegate to:

+ +
agents:
+  root:
+    model: anthropic/claude-sonnet-4-0
+    description: Technical lead
+    instruction: Delegate to specialists based on the task.
+    sub_agents: [frontend, backend, devops]
+
+  frontend:
+    model: openai/gpt-4o
+    # Good: specific and actionable
+    description: |
+      Frontend specialist. Handles React, TypeScript, CSS, 
+      UI components, and browser-related issues.
+
+  backend:
+    model: openai/gpt-4o
+    # Good: clear domain boundaries
+    description: |
+      Backend specialist. Handles APIs, databases, 
+      server logic, and Go/Python code.
+
+  devops:
+    model: openai/gpt-4o
+    description: |
+      DevOps specialist. Handles CI/CD, Docker, Kubernetes,
+      infrastructure, and deployment pipelines.
+ +

Debugging Tips

+ +

Enable Debug Logging

+ +

Use the --debug flag to see detailed execution logs:

+ +
# Default log location: ~/.cagent/cagent.debug.log
+cagent run agent.yaml --debug
+
+# Custom log location
+cagent run agent.yaml --debug --log-file ./debug.log
+ +

Check Token Usage

+ +

Use the /usage command during a session to see token consumption:

+ +
/usage
+
+Token Usage:
+  Input:  12,456 tokens
+  Output:  3,789 tokens
+  Total:  16,245 tokens
+ +

Compact Long Sessions

+ +

If a session gets too long, use /compact to summarize and reduce context:

+ +
/compact
+
+Session compacted. Summary generated and history trimmed.
+ +

More Tips

+ +

User-Defined Default Model

+ +

Set your preferred default model in ~/.config/cagent/config.yaml:

+ +
settings:
+  default_model: anthropic/claude-sonnet-4-0
+ +

This model is used when you run cagent run without a config file.

+ +

GitHub PR Reviewer Example

+ +

Use cagent as a GitHub Actions PR reviewer:

+ +
# .github/workflows/pr-review.yml
+name: PR Review
+on:
+  pull_request:
+    types: [opened, synchronize]
+
+jobs:
+  review:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run cagent review
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          # Install cagent
+          curl -fsSL https://get.cagent.dev | sh
+          
+          # Run the review
+          cagent exec reviewer.yaml --yolo \
+            "Review PR #${{ github.event.pull_request.number }}"
+ +

With a simple reviewer agent:

+ +
# reviewer.yaml
+agents:
+  root:
+    model: anthropic/claude-sonnet-4-0
+    description: PR reviewer
+    instruction: |
+      Review pull requests for code quality, bugs, and security issues.
+      Be constructive and specific in your feedback.
+    toolsets:
+      - type: mcp
+        ref: docker:github-official
+      - type: think
diff --git a/docs/pages/providers/local.html b/docs/pages/providers/local.html new file mode 100644 index 000000000..8bf6b39b7 --- /dev/null +++ b/docs/pages/providers/local.html @@ -0,0 +1,194 @@ +

Local Models (Ollama, vLLM, LocalAI)

+

Run cagent with locally hosted models for privacy, offline use, or cost savings.

+ +

Overview

+ +

cagent can connect to any OpenAI-compatible local model server. This guide covers the most popular options:

+ +
    +
  • Ollama — Easy-to-use local model runner
  • +
  • vLLM — High-performance inference server
  • +
  • LocalAI — OpenAI-compatible API for various backends
  • +
+ +
+
💡 Docker Model Runner
+

For the easiest local model experience, consider Docker Model Runner which is built into Docker Desktop and requires no additional setup.

+
+ +

Ollama

+ +

Ollama is a popular tool for running LLMs locally. cagent includes a built-in ollama alias for easy configuration.

+ +

Setup

+ +
    +
  1. Install Ollama from ollama.ai
  2. +
  3. Pull a model: +
    ollama pull llama3.2
    +ollama pull qwen2.5-coder
    +
  4. +
  5. Start the Ollama server (usually runs automatically): +
    ollama serve
    +
  6. +
+ +

Configuration

+ +

Use the built-in ollama alias:

+ +
agents:
+  root:
+    model: ollama/llama3.2
+    description: Local assistant
+    instruction: You are a helpful assistant.
+ +

The ollama alias automatically uses:

+
    +
  • Base URL: http://localhost:11434/v1
  • +
  • API Type: OpenAI-compatible
  • +
  • No API key required
  • +
+ +

Custom Port or Host

+ +

If Ollama runs on a different host or port:

+ +
models:
+  my_ollama:
+    provider: ollama
+    model: llama3.2
+    base_url: http://192.168.1.100:11434/v1
+
+agents:
+  root:
+    model: my_ollama
+    description: Remote Ollama assistant
+    instruction: You are a helpful assistant.
+ +

Popular Ollama Models

+ + + + + + + + + + + +
ModelSizeBest For
llama3.23BGeneral purpose, fast
llama3.18BBetter reasoning
qwen2.5-coder7BCode generation
mistral7BGeneral purpose
codellama7BCode tasks
deepseek-coder6.7BCode generation
+ +

vLLM

+ +

vLLM is a high-performance inference server optimized for throughput.

+ +

Setup

+ +
# Install vLLM
+pip install vllm
+
+# Start the server
+python -m vllm.entrypoints.openai.api_server \
+  --model meta-llama/Llama-3.2-3B-Instruct \
+  --port 8000
+ +

Configuration

+ +
providers:
+  vllm:
+    api_type: openai_chatcompletions
+    base_url: http://localhost:8000/v1
+
+agents:
+  root:
+    model: vllm/meta-llama/Llama-3.2-3B-Instruct
+    description: vLLM-powered assistant
+    instruction: You are a helpful assistant.
+ +

LocalAI

+ +

LocalAI provides an OpenAI-compatible API that works with various backends.

+ +

Setup

+ +
# Run with Docker
+docker run -p 8080:8080 --name local-ai \
+  -v ./models:/models \
+  localai/localai:latest-cpu
+ +

Configuration

+ +
providers:
+  localai:
+    api_type: openai_chatcompletions
+    base_url: http://localhost:8080/v1
+
+agents:
+  root:
+    model: localai/gpt4all-j
+    description: LocalAI assistant
+    instruction: You are a helpful assistant.
+ +

Generic Custom Provider

+ +

For any OpenAI-compatible server:

+ +
providers:
+  my_server:
+    api_type: openai_chatcompletions
+    base_url: http://localhost:8000/v1
+    # token_key: MY_API_KEY  # if auth required
+
+agents:
+  root:
+    model: my_server/model-name
+    description: Custom server assistant
+    instruction: You are a helpful assistant.
+ +

Performance Tips

+ +
+
ℹ️ Local Model Considerations
+
    +
  • Memory: Larger models need more RAM/VRAM. A 7B model typically needs 8-16GB RAM.
  • +
  • GPU: GPU acceleration dramatically improves speed. Check your server's GPU support.
  • +
  • Context length: Local models often have smaller context windows than cloud models.
  • +
  • Tool calling: Not all local models support function/tool calling. Test your model's capabilities.
  • +
+
+ +

Example: Offline Development Agent

+ +
agents:
+  developer:
+    model: ollama/qwen2.5-coder
+    description: Offline code assistant
+    instruction: |
+      You are a software developer working offline.
+      Focus on code quality and clear explanations.
+    max_iterations: 20
+    toolsets:
+      - type: filesystem
+      - type: shell
+      - type: think
+      - type: todo
+ +

Troubleshooting

+ +

Connection Refused

+

Ensure your model server is running and accessible:

+
curl http://localhost:11434/v1/models  # Ollama
+curl http://localhost:8000/v1/models   # vLLM
+ +

Model Not Found

+

Verify the model is downloaded/available:

+
ollama list  # List available Ollama models
+ +

Slow Responses

+
    +
  • Check if GPU acceleration is enabled
  • +
  • Try a smaller model
  • +
  • Reduce max_tokens in your config
  • +
diff --git a/docs/pages/providers/mistral.html b/docs/pages/providers/mistral.html new file mode 100644 index 000000000..d4855e936 --- /dev/null +++ b/docs/pages/providers/mistral.html @@ -0,0 +1,110 @@ +

Mistral

+

Use Mistral AI models with cagent.

+ +

Overview

+ +

Mistral AI provides powerful language models through an OpenAI-compatible API. cagent includes built-in support for Mistral as an alias provider.

+ +

Setup

+ +
    +
  1. Get an API key from Mistral Console
  2. +
  3. Set the environment variable: +
    export MISTRAL_API_KEY=your-api-key
    +
  4. +
+ +

Usage

+ +

Inline Syntax

+ +

The simplest way to use Mistral:

+ +
agents:
+  root:
+    model: mistral/mistral-large-latest
+    description: Assistant using Mistral
+    instruction: You are a helpful assistant.
+ +

Named Model

+ +

For more control over parameters:

+ +
models:
+  mistral:
+    provider: mistral
+    model: mistral-large-latest
+    temperature: 0.7
+    max_tokens: 8192
+
+agents:
+  root:
+    model: mistral
+    description: Assistant using Mistral
+    instruction: You are a helpful assistant.
+ +

Available Models

+ + + + + + + + + + + + +
ModelDescriptionContext
mistral-large-latestMost capable Mistral model128K
mistral-medium-latestBalanced performance and cost128K
mistral-small-latestFast and cost-effective (default)128K
codestral-latestOptimized for code generation32K
open-mistral-nemoOpen-weight model128K
ministral-8b-latestCompact 8B parameter model128K
ministral-3b-latestSmallest Mistral model128K
+ +

Check the Mistral Models documentation for the latest available models.

+ +

Auto-Detection

+ +

When you run cagent run without specifying a config, cagent automatically detects available providers. If MISTRAL_API_KEY is set and higher-priority providers (OpenAI, Anthropic, Google) are not available, Mistral will be used with mistral-small-latest as the default model.

+ +

Extended Thinking

+ +

Mistral models support thinking mode through the OpenAI-compatible API. By default, cagent enables medium thinking effort:

+ +
models:
+  mistral:
+    provider: mistral
+    model: mistral-large-latest
+    thinking_budget: high  # minimal, low, medium, high, or none
+ +

To disable thinking:

+ +
models:
+  mistral:
+    provider: mistral
+    model: mistral-large-latest
+    thinking_budget: none
+ +

How It Works

+ +

Mistral is implemented as a built-in alias in cagent:

+ +
    +
  • API Type: OpenAI-compatible (openai_chatcompletions)
  • +
  • Base URL: https://api.mistral.ai/v1
  • +
  • Token Variable: MISTRAL_API_KEY
  • +
+ +

This means Mistral uses the same client as OpenAI, making it fully compatible with all OpenAI features supported by cagent.

+ +

Example: Code Assistant

+ +
agents:
+  coder:
+    model: mistral/codestral-latest
+    description: Expert code assistant
+    instruction: |
+      You are an expert programmer using Codestral.
+      Write clean, efficient, well-documented code.
+      Explain your reasoning when helpful.
+    toolsets:
+      - type: filesystem
+      - type: shell
+      - type: think
diff --git a/docs/pages/providers/nebius.html b/docs/pages/providers/nebius.html new file mode 100644 index 000000000..c5058ff73 --- /dev/null +++ b/docs/pages/providers/nebius.html @@ -0,0 +1,82 @@ +

Nebius

+

Use Nebius AI models with cagent.

+ +

Overview

+ +

Nebius provides AI models through an OpenAI-compatible API. cagent includes built-in support for Nebius as an alias provider.

+ +

Setup

+ +
    +
  1. Get an API key from Nebius AI
  2. +
  3. Set the environment variable: +
    export NEBIUS_API_KEY=your-api-key
    +
  4. +
+ +

Usage

+ +

Inline Syntax

+ +

The simplest way to use Nebius:

+ +
agents:
+  root:
+    model: nebius/deepseek-ai/DeepSeek-V3
+    description: Assistant using Nebius
+    instruction: You are a helpful assistant.
+ +

Named Model

+ +

For more control over parameters:

+ +
models:
+  nebius_model:
+    provider: nebius
+    model: deepseek-ai/DeepSeek-V3
+    temperature: 0.7
+    max_tokens: 8192
+
+agents:
+  root:
+    model: nebius_model
+    description: Assistant using Nebius
+    instruction: You are a helpful assistant.
+ +

Available Models

+ +

Nebius hosts various open models. Check the Nebius documentation for the current model catalog.

+ + + + + + + + +
ModelDescription
deepseek-ai/DeepSeek-V3DeepSeek V3 model
Qwen/Qwen2.5-72B-InstructQwen 2.5 72B instruction-tuned
meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B instruction-tuned
+ +

How It Works

+ +

Nebius is implemented as a built-in alias in cagent:

+ +
    +
  • API Type: OpenAI-compatible (openai_chatcompletions)
  • +
  • Base URL: https://api.studio.nebius.ai/v1
  • +
  • Token Variable: NEBIUS_API_KEY
  • +
+ +

Example: Code Assistant

+ +
agents:
+  coder:
+    model: nebius/deepseek-ai/DeepSeek-V3
+    description: Code assistant using DeepSeek
+    instruction: |
+      You are an expert programmer using DeepSeek V3.
+      Write clean, well-documented code.
+      Follow best practices for the language being used.
+    toolsets:
+      - type: filesystem
+      - type: shell
+      - type: think
diff --git a/docs/pages/providers/xai.html b/docs/pages/providers/xai.html new file mode 100644 index 000000000..1260936fb --- /dev/null +++ b/docs/pages/providers/xai.html @@ -0,0 +1,95 @@ +

xAI (Grok)

+

Use xAI's Grok models with cagent.

+ +

Overview

+ +

xAI provides the Grok family of models through an OpenAI-compatible API. cagent includes built-in support for xAI as an alias provider.

+ +

Setup

+ +
    +
  1. Get an API key from xAI Console
  2. +
  3. Set the environment variable: +
    export XAI_API_KEY=your-api-key
    +
  4. +
+ +

Usage

+ +

Inline Syntax

+ +

The simplest way to use xAI:

+ +
agents:
+  root:
+    model: xai/grok-3
+    description: Assistant using Grok
+    instruction: You are a helpful assistant.
+ +

Named Model

+ +

For more control over parameters:

+ +
models:
+  grok:
+    provider: xai
+    model: grok-3
+    temperature: 0.7
+    max_tokens: 8192
+
+agents:
+  root:
+    model: grok
+    description: Assistant using Grok
+    instruction: You are a helpful assistant.
+ +

Available Models

+ + + + + + + + + + + +
ModelDescriptionContext
grok-3Latest and most capable Grok model131K
grok-3-fastFaster variant with lower latency131K
grok-3-miniCompact model for simpler tasks131K
grok-3-mini-fastFast variant of the mini model131K
grok-2Previous generation model128K
grok-visionVision-capable model32K
+ +

Check the xAI documentation for the latest available models.

+ +

Extended Thinking

+ +

Grok models support thinking mode through the OpenAI-compatible API:

+ +
models:
+  grok:
+    provider: xai
+    model: grok-3
+    thinking_budget: high  # minimal, low, medium, high, or none
+ +

How It Works

+ +

xAI is implemented as a built-in alias in cagent:

+ +
    +
  • API Type: OpenAI-compatible (openai_chatcompletions)
  • +
  • Base URL: https://api.x.ai/v1
  • +
  • Token Variable: XAI_API_KEY
  • +
+ +

Example: Research Assistant

+ +
agents:
+  researcher:
+    model: xai/grok-3
+    description: Research assistant with real-time knowledge
+    instruction: |
+      You are a research assistant using Grok.
+      Provide well-researched, factual responses.
+      Cite sources when available.
+    toolsets:
+      - type: mcp
+        ref: docker:duckduckgo
+      - type: think
diff --git a/docs/pages/tools/api.html b/docs/pages/tools/api.html new file mode 100644 index 000000000..56dd4bd7a --- /dev/null +++ b/docs/pages/tools/api.html @@ -0,0 +1,201 @@ +

API Tool

+

Create custom tools that call HTTP APIs.

+ +

Overview

+ +

The API tool type lets you define custom tools that make HTTP requests to external APIs. This is useful for integrating agents with REST APIs, webhooks, or any HTTP-based service without writing code.

+ +
+
ℹ️ When to Use
+
    +
  • Integrating with REST APIs that don't have an MCP server
  • +
  • Simple HTTP operations (GET, POST)
  • +
  • Quick prototyping before building a full MCP server
  • +
+
+ +

Configuration

+ +
agents:
+  assistant:
+    model: openai/gpt-4o
+    description: Assistant with API access
+    instruction: You can look up weather information.
+    toolsets:
+      - type: api
+        name: get_weather
+        method: GET
+        endpoint: "https://api.weather.example/v1/current?city=${city}"
+        instruction: Get current weather for a city
+        args:
+          city:
+            type: string
+            description: City name to get weather for
+        required: ["city"]
+        headers:
+          Authorization: "Bearer ${env.WEATHER_API_KEY}"
+ +

Properties

+ + + + + + + + + + + + + +
PropertyTypeRequiredDescription
namestringTool name (how the agent references it)
methodstringHTTP method: GET or POST
endpointstringURL endpoint (supports ${param} interpolation)
instructionstringDescription shown to the agent
argsobjectParameter definitions (JSON Schema properties)
requiredarrayList of required parameter names
headersobjectHTTP headers to include
output_schemaobjectJSON Schema for the response (for documentation)
+ +

HTTP Methods

+ +

GET Requests

+ +

For GET requests, parameters are interpolated into the URL:

+ +
toolsets:
+  - type: api
+    name: search_users
+    method: GET
+    endpoint: "https://api.example.com/users?q=${query}&limit=${limit}"
+    instruction: Search for users by name
+    args:
+      query:
+        type: string
+        description: Search query
+      limit:
+        type: integer
+        description: Maximum results (default 10)
+    required: ["query"]
+ +

POST Requests

+ +

For POST requests, parameters are sent as JSON in the request body:

+ +
toolsets:
+  - type: api
+    name: create_task
+    method: POST
+    endpoint: "https://api.example.com/tasks"
+    instruction: Create a new task
+    args:
+      title:
+        type: string
+        description: Task title
+      description:
+        type: string
+        description: Task description
+      priority:
+        type: string
+        enum: ["low", "medium", "high"]
+        description: Task priority
+    required: ["title"]
+    headers:
+      Content-Type: "application/json"
+      Authorization: "Bearer ${env.API_TOKEN}"
+ +

URL Interpolation

+ +

Use ${param} syntax to insert parameter values into URLs:

+ +
endpoint: "https://api.example.com/users/${user_id}/posts/${post_id}"
+ +

Parameter values are URL-encoded automatically.

+ +

Headers

+ +

Headers can include environment variables:

+ +
headers:
+  Authorization: "Bearer ${env.API_KEY}"
+  X-Custom-Header: "static-value"
+  Content-Type: "application/json"
+ +

Output Schema

+ +

Optionally document the expected response format:

+ +
toolsets:
+  - type: api
+    name: get_user
+    method: GET
+    endpoint: "https://api.example.com/users/${id}"
+    instruction: Get user details by ID
+    args:
+      id:
+        type: string
+        description: User ID
+    required: ["id"]
+    output_schema:
+      type: object
+      properties:
+        id:
+          type: string
+        name:
+          type: string
+        email:
+          type: string
+        created_at:
+          type: string
+ +

Example: GitHub API

+ +
agents:
+  github_assistant:
+    model: openai/gpt-4o
+    description: Assistant that can query GitHub
+    instruction: You can look up GitHub repositories and users.
+    toolsets:
+      - type: api
+        name: get_repo
+        method: GET
+        endpoint: "https://api.github.com/repos/${owner}/${repo}"
+        instruction: Get information about a GitHub repository
+        args:
+          owner:
+            type: string
+            description: Repository owner (user or org)
+          repo:
+            type: string
+            description: Repository name
+        required: ["owner", "repo"]
+        headers:
+          Accept: "application/vnd.github.v3+json"
+          Authorization: "Bearer ${env.GITHUB_TOKEN}"
+      
+      - type: api
+        name: get_user
+        method: GET
+        endpoint: "https://api.github.com/users/${username}"
+        instruction: Get information about a GitHub user
+        args:
+          username:
+            type: string
+            description: GitHub username
+        required: ["username"]
+        headers:
+          Accept: "application/vnd.github.v3+json"
+ +

Limitations

+ +
    +
  • Only supports GET and POST methods
  • +
  • Response body is limited to 1MB
  • +
  • 30 second timeout per request
  • +
  • Only HTTP and HTTPS URLs are supported
  • +
  • No support for file uploads or multipart forms
  • +
+ +
+
💡 For Complex APIs
+

For APIs that need authentication flows, pagination, or complex request/response handling, consider using an MCP server instead. The API tool is best for simple, stateless HTTP operations.

+
+ +
+
⚠️ Security
+

API keys and tokens in headers are visible in debug logs. Use environment variables (${env.VAR}) rather than hardcoding secrets in configuration files.

+
diff --git a/docs/pages/tools/lsp.html b/docs/pages/tools/lsp.html new file mode 100644 index 000000000..91be6a7ec --- /dev/null +++ b/docs/pages/tools/lsp.html @@ -0,0 +1,183 @@ +

LSP Tool

+

Connect to Language Server Protocol servers for code intelligence.

+ +

Overview

+ +

The LSP tool connects your agent to any Language Server Protocol (LSP) server, providing comprehensive code intelligence capabilities like go-to-definition, find references, diagnostics, and more.

+ +
+
ℹ️ What is LSP?
+

The Language Server Protocol is a standard for providing language features like autocomplete, go-to-definition, and diagnostics. Most programming languages have LSP servers available.

+
+ +

Configuration

+ +
agents:
+  developer:
+    model: anthropic/claude-sonnet-4-0
+    description: Code developer with LSP support
+    instruction: You are a software developer.
+    toolsets:
+      - type: lsp
+        command: gopls
+        args: []
+        file_types: [".go"]
+      - type: filesystem
+      - type: shell
+ +

Properties

+ + + + + + + + + +
PropertyTypeRequiredDescription
commandstringLSP server executable command
argsarrayCommand-line arguments for the LSP server
envobjectEnvironment variables for the LSP process
file_typesarrayFile extensions this LSP handles (e.g., [".go", ".mod"])
+ +

Available Tools

+ +

The LSP toolset provides these tools to the agent:

+ + + + + + + + + + + + + + + + + + + + +
ToolDescriptionRead-Only
lsp_workspaceGet workspace info and available capabilities
lsp_hoverGet type info and documentation for a symbol
lsp_definitionFind where a symbol is defined
lsp_referencesFind all references to a symbol
lsp_document_symbolsList all symbols in a file
lsp_workspace_symbolsSearch symbols across the workspace
lsp_diagnosticsGet errors and warnings for a file
lsp_code_actionsGet available quick fixes and refactorings
lsp_renameRename a symbol across the workspace
lsp_formatFormat a file
lsp_call_hierarchyFind incoming/outgoing calls
lsp_type_hierarchyFind supertypes/subtypes
lsp_implementationsFind interface implementations
lsp_signature_helpGet function signature at call site
lsp_inlay_hintsGet type annotations and parameter names
+ +

Common LSP Servers

+ +

Here are configurations for popular languages:

+ +

Go (gopls)

+ +
toolsets:
+  - type: lsp
+    command: gopls
+    file_types: [".go"]
+ +

TypeScript/JavaScript (typescript-language-server)

+ +
toolsets:
+  - type: lsp
+    command: typescript-language-server
+    args: ["--stdio"]
+    file_types: [".ts", ".tsx", ".js", ".jsx"]
+ +

Python (pylsp)

+ +
toolsets:
+  - type: lsp
+    command: pylsp
+    file_types: [".py"]
+ +

Rust (rust-analyzer)

+ +
toolsets:
+  - type: lsp
+    command: rust-analyzer
+    file_types: [".rs"]
+ +

C/C++ (clangd)

+ +
toolsets:
+  - type: lsp
+    command: clangd
+    file_types: [".c", ".cpp", ".h", ".hpp"]
+ +

Multiple LSP Servers

+ +

You can configure multiple LSP servers for different file types:

+ +
agents:
+  polyglot:
+    model: anthropic/claude-sonnet-4-0
+    description: Multi-language developer
+    instruction: You are a full-stack developer.
+    toolsets:
+      - type: lsp
+        command: gopls
+        file_types: [".go"]
+      - type: lsp
+        command: typescript-language-server
+        args: ["--stdio"]
+        file_types: [".ts", ".tsx", ".js", ".jsx"]
+      - type: lsp
+        command: pylsp
+        file_types: [".py"]
+      - type: filesystem
+      - type: shell
+ +

Workflow Instructions

+ +

The LSP tool includes built-in instructions that guide the agent on how to use it effectively. The agent learns to:

+ +
    +
  1. Start with lsp_workspace to understand available capabilities
  2. +
  3. Use lsp_workspace_symbols to find relevant code
  4. +
  5. Use lsp_references before modifying any symbol
  6. +
  7. Check lsp_diagnostics after every code change
  8. +
  9. Apply lsp_format after edits are complete
  10. +
+ +
+
💡 Best Practice
+

Always include the filesystem tool alongside LSP. The agent needs filesystem access to read and write code files, while LSP provides intelligence about the code.

+
+ +

Capability Detection

+ +

Not all LSP servers support all features. The agent uses lsp_workspace to discover what's available:

+ +
Workspace Information:
+- Root: /path/to/project
+- Server: gopls v0.14.0
+- File types: .go
+
+Available Capabilities:
+- Hover: Yes
+- Go to Definition: Yes
+- Find References: Yes
+- Rename: Yes
+- Code Actions: Yes
+- Formatting: Yes
+- Call Hierarchy: Yes
+- Type Hierarchy: Yes
+...
+ +

Position Format

+ +

All LSP tools use 1-based line and character positions:

+ +
    +
  • Line 1 is the first line of the file
  • +
  • Character 1 is the first character on a line
  • +
+ +
{
+  "file": "/path/to/file.go",
+  "line": 42,
+  "character": 15
+}
+ +
+
⚠️ Server Installation
+

The LSP server must be installed and available in the system PATH. cagent does not install LSP servers automatically. Install them using your language's package manager (e.g., go install golang.org/x/tools/gopls@latest).

+
diff --git a/docs/pages/tools/user-prompt.html b/docs/pages/tools/user-prompt.html new file mode 100644 index 000000000..80a4be87a --- /dev/null +++ b/docs/pages/tools/user-prompt.html @@ -0,0 +1,173 @@ +

User Prompt Tool

+

Ask the user questions and collect interactive input during agent execution.

+ +

Overview

+ +

The user prompt tool allows agents to ask questions and collect input from users during execution. This enables interactive workflows where the agent needs clarification, confirmation, or additional information before proceeding.

+ +
+
ℹ️ When to Use
+
    +
  • When the agent needs clarification before proceeding
  • +
  • Collecting credentials or configuration values
  • +
  • Presenting choices and getting user decisions
  • +
  • Confirming destructive or important actions
  • +
+
+ +

Configuration

+ +
agents:
+  assistant:
+    model: openai/gpt-4o
+    description: Interactive assistant
+    instruction: |
+      You are a helpful assistant. When you need information
+      from the user, use the user_prompt tool to ask them.
+    toolsets:
+      - type: user_prompt
+      - type: filesystem
+      - type: shell
+ +

Tool Interface

+ +

The user_prompt tool takes these parameters:

+ + + + + + + +
ParameterTypeRequiredDescription
messagestringThe question or prompt to display
schemaobjectJSON Schema defining expected response structure
+ +

Response Format

+ +

The tool returns a JSON response:

+ +
{
+  "action": "accept",
+  "content": {
+    "field1": "user value",
+    "field2": true
+  }
+}
+ +

Action Values

+ + + + + + + + +
ActionMeaning
acceptUser provided a response (check content)
declineUser declined to answer
cancelUser cancelled the prompt
+ +

Schema Examples

+ +

Simple String Input

+ +
{
+  "type": "string",
+  "title": "API Key",
+  "description": "Enter your API key"
+}
+ +

Multiple Choice

+ +
{
+  "type": "string",
+  "enum": ["development", "staging", "production"],
+  "title": "Environment",
+  "description": "Select the target environment"
+}
+ +

Boolean Confirmation

+ +
{
+  "type": "boolean",
+  "title": "Confirm",
+  "description": "Are you sure you want to proceed?"
+}
+ +

Object with Multiple Fields

+ +
{
+  "type": "object",
+  "properties": {
+    "username": {
+      "type": "string",
+      "description": "Your username"
+    },
+    "password": {
+      "type": "string",
+      "description": "Your password"
+    },
+    "remember": {
+      "type": "boolean",
+      "description": "Remember credentials"
+    }
+  },
+  "required": ["username", "password"]
+}
+ +

Number Input

+ +
{
+  "type": "integer",
+  "title": "Port Number",
+  "description": "Enter the port number (1024-65535)",
+  "minimum": 1024,
+  "maximum": 65535
+}
+ +

Example Usage

+ +

Here's how an agent might use the user prompt tool:

+ +
Agent: I need to deploy this application. Let me ask which environment to target.
+
+[Calls user_prompt with message: "Which environment should I deploy to?" 
+ and schema with enum: ["development", "staging", "production"]]
+
+User selects: "staging"
+
+Agent: Great, I'll deploy to staging. Let me confirm this action.
+
+[Calls user_prompt with message: "Deploy to staging? This will replace the current version."
+ and schema with type: "boolean"]
+
+User confirms: true
+
+Agent: Deploying to staging...
+ +

UI Presentation

+ +

How the prompt appears depends on the interface:

+ +
    +
  • TUI: Displays an interactive dialog with appropriate input controls
  • +
  • CLI (exec mode): Prints the prompt and reads from stdin
  • +
  • API/MCP: Returns an elicitation request to the client
  • +
+ +
+
💡 Best Practice
+

Provide clear, concise messages. Include context about why you're asking and what the information will be used for. Use schemas with descriptions to guide users on expected input format.

+
+ +

Handling Responses

+ +

The agent should handle all possible actions:

+ +
    +
  • accept: Process the content and continue
  • +
  • decline: Acknowledge and try an alternative approach or explain what's needed
  • +
  • cancel: Stop the current operation gracefully
  • +
+ +
+
⚠️ Context Requirement
+

The user prompt tool requires an elicitation handler to be configured. It works in the TUI and CLI modes but may not be available in all contexts (e.g., some MCP client configurations).

+