Chrome DevTools for AI Agents - Real-time debugging, pause, inspect, and step through your AI agent execution.
Building AI agents is hard. Debugging them is harder.
"Your AI agent worked perfectly in testing. Then it hit production and called the wrong tool 14 times in a loop, burned $40 of API credits, and returned gibberish to your user. This is not a rare scenario. It's the default scenario."
Existing observability tools show you what happened after the fact. AgentDBG lets you watch it happen and stop it when things go wrong.
- Real-time Visualization - Watch your agent's execution unfold in real-time
- Pause & Resume - Stop execution at any point to inspect state
- Step-through Debugging - Advance one LLM call at a time
- Breakpoints - Pause on cost thresholds, errors, or custom conditions
- Cost Tracking - Real-time token and cost tracking per span
- Auto-instrumentation - Zero-config support for OpenAI, Anthropic, and LangChain
- Local-first - All data stays on your machine, sub-millisecond overhead
pip install agent-devtoolsRun any Python script with AgentDBG instrumentation:
agentdbg run my_agent.pyThis will:
- Auto-instrument OpenAI, Anthropic, and LangChain calls
- Start the debugging UI at http://localhost:8766
- Open your browser to the live trace viewer
For more control, use the @traced decorator or trace context manager:
from agentdbg import trace, traced, SpanKind
# Using decorator
@traced(name="process_query", kind=SpanKind.AGENT_STEP)
def process_query(query: str) -> str:
# Your agent logic here
return result
# Using context manager
with trace(name="llm_call", kind=SpanKind.LLM_CALL) as span:
response = call_llm(messages)
span.output_data = {"response": response}# Run a script with debugging
agentdbg run script.py
# Run with cost limit (pause when exceeded)
agentdbg run script.py --cost-limit 1.0
# Run paused at start
agentdbg run script.py --pause-on-start
# Start server only (for external connections)
agentdbg server
# View recent traces
agentdbg traces
# Show statistics
agentdbg stats
# Clean up old traces
agentdbg cleanup --days 7- Pause - Stop execution at the current point
- Resume - Continue execution
- Step - Execute one span and pause again
- Clear - Remove all traces
Set breakpoints programmatically:
from agentdbg import get_debugger
debugger = get_debugger()
# Pause when cost exceeds $0.50
debugger.state.add_breakpoint(
lambda span: span.cost.total_cost > 0.50
)
# Pause on any error
debugger.state.add_breakpoint(
lambda span: span.error is not None
)
# Pause on specific span name
debugger.state.add_breakpoint(
lambda span: "dangerous_tool" in span.name
)AgentDBG automatically tracks costs for popular models:
from agentdbg.config import MODEL_COSTS
# Supported models:
# - OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini
# - Anthropic: claude-3-opus, claude-3-sonnet, claude-3-haiku, claude-3-5-sonnetCosts are displayed in real-time in the UI and can trigger breakpoints.
from openai import OpenAI
from agentdbg.instrumentors import auto_instrument
auto_instrument() # Done automatically by CLI
client = OpenAI()
# All calls are now traced automatically
response = client.chat.completions.create(...)from anthropic import Anthropic
from agentdbg.instrumentors import auto_instrument
auto_instrument()
client = Anthropic()
# All calls are now traced automatically
response = client.messages.create(...)from langchain_openai import ChatOpenAI
from agentdbg.instrumentors.langchain_instrumentor import AgentDBGCallbackHandler
# Use the callback handler
handler = AgentDBGCallbackHandler()
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
callbacks=[handler],
)Traces are stored locally in SQLite:
from agentdbg.storage import SQLiteStorage
storage = SQLiteStorage(".agentdbg/traces.db")
# Get recent traces
traces = storage.get_traces(limit=10)
# Get statistics
stats = storage.get_stats()
print(f"Total cost: ${stats['total_cost']:.2f}")
print(f"Total tokens: {stats['total_tokens']:,}")
# Clean up old data
storage.delete_old_traces(days=7)from agentdbg import DebugConfig, AgentDebugger
config = DebugConfig(
# Server
host="127.0.0.1",
port=8765,
ui_port=8766,
# Auto-pause
auto_pause_on_error=True,
auto_pause_on_cost=1.0, # Pause at $1.00
auto_pause_on_tokens=100000, # Pause at 100k tokens
# Data capture
capture_inputs=True,
capture_outputs=True,
max_input_size=10000,
max_output_size=10000,
)
debugger = AgentDebugger(config=config)┌─────────────────────────────────────────────────────────────┐
│ Your Agent Code │
│ (OpenAI, Anthropic, LangChain, Custom) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ AgentDBG SDK │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Instrumentor│ │ Core │ │ Storage │ │
│ │ (auto-wrap)│ │(trace/span) │ │ (SQLite) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ WebSocket Server │
│ (Real-time streaming) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Web UI │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────────┐ │
│ │ Trace List │ │ Span Tree │ │ Inspector (State/Cost) │ │
│ └────────────┘ └────────────┘ └────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
| Feature | AgentDBG | Langfuse | LangSmith |
|---|---|---|---|
| Real-time pause/resume | ✅ | ❌ | ❌ |
| Step-through debugging | ✅ | ❌ | ❌ |
| Breakpoints | ✅ | ❌ | ❌ |
| Local-first | ✅ | ❌ | |
| Zero-config | ✅ | ||
| Open source | ✅ | ✅ | ❌ |
| Cost tracking | ✅ | ✅ | ✅ |
# Clone the repo
git clone https://github.com/agentdbg/agentdbg.git
cd agentdbg
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check src tests
mypy srcContributions are welcome! Please read our Contributing Guide for details.
MIT License - see LICENSE for details.
Built with frustration and love by developers who've spent too many hours staring at logs wondering why their agent decided to search Google 47 times in a row.
Stop guessing why your agent failed. See every thought. Pause anywhere. Fix it live.