Skip to content

AxmeAI/agent-timeout-and-escalation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent Timeout and Escalation

Your agent called a tool 30 minutes ago and never got a response. AXME adds timeout, retry, and human escalation to any agent action.

AI agents get stuck. External APIs go down. Long-running tasks hang. Without timeout and escalation, your agent waits forever - burning compute and blocking the pipeline. AXME adds configurable timeouts with automatic escalation chains.

Alpha - Built with AXME (AXP Intent Protocol). cloud.axme.ai - hello@axme.ai


The Problem

Agent calls external API:
  09:00 - POST /api/inventory/sync
  09:01 - waiting...
  09:05 - still waiting...
  09:30 - still waiting... (API is down)
  10:00 - still waiting... (nobody knows)
  11:00 - someone finally notices the agent is stuck

What should have happened:
  09:00 - POST /api/inventory/sync
  09:03 - TIMEOUT: 3 min deadline exceeded
  09:03 - RETRY: attempt 2 of 3
  09:06 - RETRY: attempt 3 of 3
  09:06 - ESCALATE: notify on-call engineer
  09:09 - REMIND: "still waiting for engineer"
  09:15 - Engineer resolves the issue

The Solution

intent_id = client.send_intent({
    "intent_type": "intent.ops.stuck_task.v1",
    "to_agent": "agent://myorg/production/task-agent",
    "payload": {"operation": "sync_inventory", "error": "connection_timeout"},
})

# AXME handles:
# - Retry (up to max_delivery_attempts)
# - Timeout (step_deadline_seconds)
# - Escalation to human (remind_after_seconds)
# - Full audit trail
result = client.wait_for(intent_id)

Quick Start

pip install axme
export AXME_API_KEY="your-key"   # Get one: axme login
from axme import AxmeClient, AxmeClientConfig
import os

client = AxmeClient(AxmeClientConfig(api_key=os.environ["AXME_API_KEY"]))

intent_id = client.send_intent({
    "intent_type": "intent.ops.stuck_task.v1",
    "to_agent": "agent://myorg/production/task-agent",
    "payload": {
        "task_id": "OPS-2026-0712",
        "operation": "sync_inventory",
        "target_api": "https://api.vendor.com/v2/inventory",
        "error": "connection_timeout",
        "retry_count": 3,
    },
})

print(f"Task submitted: {intent_id}")
result = client.wait_for(intent_id)
print(f"Resolution: {result['status']}")

Before / After

Before: DIY Timeout + Escalation

async def run_with_timeout(task, timeout=180):
    try:
        result = await asyncio.wait_for(task(), timeout=timeout)
    except asyncio.TimeoutError:
        for attempt in range(3):
            try:
                result = await asyncio.wait_for(task(), timeout=timeout)
                break
            except asyncio.TimeoutError:
                continue
        else:
            # All retries failed - escalate
            send_pagerduty_alert(task.id, "All retries exhausted")
            send_slack_message("#oncall", f"Task {task.id} stuck")
            # Now poll for human resolution...
            while True:
                status = db.get(f"escalation:{task.id}")
                if status == "resolved":
                    break
                await asyncio.sleep(10)
    return result

After: AXME Timeout + Escalation (built-in)

intent_id = client.send_intent({
    "intent_type": "intent.ops.stuck_task.v1",
    "to_agent": "agent://myorg/production/task-agent",
    "payload": {"operation": "sync_inventory", "error": "connection_timeout"},
})
result = client.wait_for(intent_id)

Timeout, retry, escalation, reminders - all configured in the scenario, not in your code.


How It Works

+-----------+  send_intent()   +----------------+  deliver    +-----------+
|           | ---------------> |                | ---------> |           |
| Initiator |                  |   AXME Cloud   |            |   Agent   |
|           | <- wait_for() -- |   (platform)   | <- resume  |           |
|           |                  |                |            | diagnoses |
|           |                  |  agent done    |            | problem   |
|           |                  |  v             |            +-----------+
|           |                  |  WAITING for   |
|           |                  |  human action  |  resolve   +-----------+
|           |                  |                | <--------- |           |
|           |                  |  - remind 3m   |            |  On-Call  |
|           | <- COMPLETED --- |  - timeout 30m |            | Engineer  |
+-----------+                  +----------------+            +-----------+

Run the Full Example

Prerequisites

curl -fsSL https://raw.githubusercontent.com/AxmeAI/axme-cli/main/install.sh | sh
axme login
pip install axme

Terminal 1

axme scenarios apply scenario.json

Terminal 2

cat ~/.config/axme/scenario-agents.json | grep -A2 timeout-agent-demo
AXME_API_KEY=<agent-key> python agent.py

Terminal 1 - resolve

axme tasks approve <intent_id>

Verify

axme intents get <intent_id>
# lifecycle_status: COMPLETED

Related


Built with AXME (AXP Intent Protocol).

About

Your agent called a tool 30 minutes ago and never got a response. AXME adds timeout, retry, and human escalation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages