A production-ready autonomous coding system that builds complete applications over multiple sessions. Features enterprise-grade quality gates, human-in-the-loop controls, parallel execution, and comprehensive design iteration workflows.
- Human Input Gates - Agents request credentials, API keys, and design decisions when genuinely needed
- Supervisory Agent Mode - Optional AI supervisor handles all human input requests for full automation
- Pause/Drain Mode - Graceful pause that lets running agents finish their current work (better than Ctrl+C)
- Checkpoint System - Automated code review every N features (code quality, security, performance)
- Design Iteration - Multi-persona UX review (visual designer, UX researcher, accessibility expert)
- Skip Management - Smart handling of blocked features with dependency tracking
- Metrics Dashboard - Real-time performance tracking with ROI analysis
- Run 1-5 concurrent agents with dependency-aware scheduling
- Isolated browser contexts per agent
- Automatic regression testing with configurable agent ratios
- Mission control dashboard with agent mascots (Spark, Fizz, Octo, Hoot, Buzz)
- React UI with Tailwind CSS v4 (neobrutalism design)
- FastAPI backend with WebSocket real-time updates
- SQLite with proper migrations and cross-platform support
- Defense-in-depth security (sandbox, filesystem restrictions, command allowlist)
- Playwright CLI for reliable browser automation
1. Claude Code CLI (Required)
macOS / Linux:
curl -fsSL https://claude.ai/install.sh | bashWindows (PowerShell):
irm https://claude.ai/install.ps1 | iex2. Authentication
Choose one:
- Claude Pro/Max - Run
claude login(recommended) - Anthropic API Key - Pay-per-use from https://console.anthropic.com/
Windows:
start_ui.batmacOS / Linux:
./start_ui.shOpens at http://localhost:5173 with:
- π Kanban board with drag & drop
- π― Real-time agent output streaming
- π Start/pause/stop controls
- π Progress tracking and metrics
- π¨ Dependency graph visualization
Windows:
start.batmacOS / Linux:
./start.shThe CLI menu provides:
- Create new project with
/create-speccommand - Continue existing projects
- Automatic environment setup
- Authentication verification
Initializer Agent (First Session)
- Reads your app specification (from
app_spec.txtin XML format) - Generates feature test cases with priorities
- Sets up project structure and git
- Creates
features.dbwith all test cases
Coding Agent (Subsequent Sessions)
- Implements features one by one
- Verifies via browser automation (Playwright CLI)
- Runs regression tests on passing features
- Auto-continues with 3-second delay between sessions
The agent interacts with features through an MCP server:
Core Operations:
-
feature_get_next- Get highest-priority pending feature -
feature_claim_next- Atomically claim feature (parallel mode) -
feature_mark_passing- Mark feature complete -
feature_skip- Move feature to end of queue -
feature_get_for_regression- Random passing features for testing -
feature_request_human_input- Request structured input from humans -
ask_user- Ask questions with selectable options and get responses
Dependency Management:
feature_add_dependency- Add dependency (with cycle detection)feature_get_ready- Get features with satisfied dependenciesfeature_get_blocked- Get features blocked by dependenciesfeature_get_graph- Dependency graph for visualization
Agents interact with the browser using simple playwright-cli bash commands:
Instead of MCP server complexity, agents use simple bash commands:
# Open browser and navigate
playwright-cli open http://localhost:3000
# Take snapshot to see element refs
playwright-cli snapshot
# Interact with elements
playwright-cli click e15
playwright-cli type "search query"
playwright-cli fill e3 "user@example.com"
# Verify visually
playwright-cli screenshot
# Check for errors
playwright-cli console
# Close when done
playwright-cli closeBenefits:
- Simpler than MCP server (fewer moving parts)
- Snapshots save to
.playwright-cli/(token-efficient) - Direct bash invocation (no subprocess overhead)
- Easier debugging via logs
When agents need credentials or design decisions:
# Agent requests OAuth credentials
feature_request_human_input(
feature_id=1,
prompt="Need OAuth credentials for Google API",
fields=[
{
"id": "client_id",
"label": "OAuth Client ID",
"type": "text",
"required": True
},
{
"id": "client_secret",
"label": "OAuth Client Secret",
"type": "text",
"required": True
}
]
)Response via UI or API:
curl -X POST http://localhost:8000/api/projects/my-app/features/1/human-input \
-H "Content-Type: application/json" \
-d '{
"client_id": "123456.apps.googleusercontent.com",
"client_secret": "GOCSPX-secret-123"
}'Supervisory Agent Mode (Full Automation):
export ENABLE_SUPERVISORY_AGENT=true
python autonomous_agent_demo.py --project-dir my-app --supervisory-agentAI supervisor automatically provides mock credentials for development.
Graceful pause that drains running agents (better than Ctrl+C):
# Via API
curl -X POST http://localhost:8000/api/projects/my-app/agent/pause
# Via UI
Click "Pause" button β agents drain β enters paused state
# Resume
curl -X POST http://localhost:8000/api/projects/my-app/agent/resumeHow it works:
- Creates
.autocoder/.pause_drainsignal file - Orchestrator stops spawning new agents
- Running agents complete their current features
- Enters paused state until signal file removed
- Resume deletes signal file and continues
Automated quality gates every N features:
# Configure checkpoints
{
"checkpoint_interval": 10, # Review every 10 features
"agents": {
"code_review": true, # Code quality analysis
"security": true, # OWASP Top 10 audit
"performance": true # Performance analysis
},
"pause_on_critical": true # Auto-pause on critical issues
}Checkpoint Agents:
- Code Review - Linting, type safety, maintainability
- Security Audit - XSS, SQL injection, CSRF, auth issues
- Performance - Bundle size, render performance, optimization
Reports saved to checkpoints/checkpoint_N.md
Multi-persona UX review workflow:
python design/review.py --project-dir my-app --iteration 1Personas:
- Visual Designer - Color theory, typography, layout
- UX Researcher - User flows, accessibility, usability
- Accessibility Expert - WCAG compliance, screen readers
- Brand Strategist - Consistency, messaging, tone
Each persona provides feedback β agent iterates β repeat until approved.
Run multiple agents concurrently:
python autonomous_agent_demo.py \
--project-dir my-app \
--parallel \
--max-concurrency 3 \
--testing-agent-ratio 1Features:
- Dependency-aware scheduling (blocked features skipped)
- Isolated browser contexts per agent
- Atomic feature claiming (no race conditions)
- Configurable testing agent ratio (0-3 per coding agent)
- Mission Control UI with agent status tracking
Real-time dashboard and ROI analysis:
python metrics/dashboard.py --project-dir my-appMetrics tracked:
- Features per hour
- Average feature duration
- Regression test coverage
- Blocker resolution time
- Token usage and cost
- ROI calculation (dev time saved)
Reports: metrics/performance_report.md
autocoder-2/
βββ start.bat / start.sh # CLI launcher
βββ start_ui.bat / start_ui.sh # Web UI launcher
βββ start.py # CLI menu with project management
βββ start_ui.py # FastAPI server launcher
βββ autonomous_agent_demo.py # Agent entry point
βββ agent.py # Session loop (Claude Agent SDK)
βββ client.py # ClaudeSDKClient with security hooks
βββ security.py # Bash allowlist validation
βββ prompts.py # Prompt template loading
βββ progress.py # Progress tracking & webhooks
βββ registry.py # Project registry (SQLite)
βββ paths.py # Runtime file path resolution
β
βββ api/
β βββ database.py # SQLAlchemy models (Feature, Checkpoint, etc.)
β βββ dependency_resolver.py # Cycle detection (Kahn's + DFS)
β
βββ mcp_server/
β βββ feature_mcp.py # MCP server with 15+ tools
β
βββ checkpoint/
β βββ orchestrator.py # Checkpoint execution engine
β βββ agent_code_review.py # Code quality agent
β βββ agent_security.py # Security audit agent
β βββ agent_performance.py # Performance agent
β βββ autofix.py # Auto-create fix features
β
βββ design/
β βββ persona_system.py # Persona loading & management
β βββ iteration.py # Design iteration workflow
β βββ review.py # CLI tool for design review
β
βββ metrics/
β βββ collector.py # Performance metrics collection
β βββ dashboard.py # Real-time CLI dashboard
β βββ report_generator.py # ROI analysis reports
β
βββ parallel_orchestrator.py # Concurrent agent execution
β
βββ server/
β βββ main.py # FastAPI REST API
β βββ websocket.py # Real-time WebSocket updates
β βββ routers/ # API endpoints
β β βββ agent.py # Agent control (start/stop/pause)
β β βββ features.py # Feature CRUD + human input
β β βββ projects.py # Project management
β β βββ filesystem.py # Folder browser
β βββ services/
β βββ process_manager.py # Agent subprocess management
β
βββ ui/ # React 18 + TypeScript
β βββ src/
β β βββ App.tsx # Main app
β β βββ components/ # UI components
β β β βββ AgentMissionControl.tsx
β β β βββ DependencyGraph.tsx
β β β βββ CelebrationOverlay.tsx
β β βββ hooks/
β β β βββ useWebSocket.ts # Real-time updates
β β β βββ useProjects.ts # React Query hooks
β β βββ lib/
β β βββ api.ts # REST client
β β βββ types.ts # TypeScript types
β βββ tailwind.config.ts # Neobrutalism theme
β
βββ .claude/
β βββ commands/
β β βββ create-spec.md # /create-spec slash command
β βββ skills/
β β βββ playwright-cli/ # Browser automation skill
β β βββ frontend-design/ # Distinctive UI design skill
β βββ templates/ # Prompt templates
β
βββ tests/
β βββ test_human_input_system.py
β βββ test_pause_drain_mode.py
β βββ test_phase*.py # Integration tests
β
βββ docs/
β βββ DEVELOPER_GUIDE.md
β βββ TROUBLESHOOTING.md
β βββ UAT_HUMAN_INPUT_AND_PAUSE.md
β βββ requirements/ # PRD documents
β
βββ requirements.txt # Python dependencies
Defense-in-depth approach (see security.py and client.py):
Bash commands run in isolated environment (prevents filesystem escape)
File operations restricted to project directory only (Read(./**), Write(./**))
Only whitelisted commands permitted:
ALLOWED_COMMANDS = {
# File inspection
"ls", "cat", "head", "tail", "wc", "grep",
# File operations
"cp", "mkdir", "chmod", "mv", "rm", "touch",
# Node.js development
"npm", "npx", "pnpm", "node",
# Version control
"git",
# Browser automation
"playwright-cli",
# Process management
"ps", "lsof", "sleep", "kill", "pkill",
# Other
"curl", "docker", "sh", "bash"
}Commands not in allowlist are blocked by the security hook.
Special validation for sensitive commands:
pkill- Only dev server processeschmod- No dangerous permissionsinit.sh- Environment setup only
Create .env file in project root:
# N8N webhook for progress notifications (optional)
PROGRESS_N8N_WEBHOOK_URL=https://n8n.example.com/webhook/abc123
# Alternative API provider (optional)
ANTHROPIC_DEFAULT_SONNET_MODEL=glm-4.7
# Supervisory agent mode (optional)
ENABLE_SUPERVISORY_AGENT=true
SUPERVISORY_AGENT_MODEL=claude-sonnet-4-5
# Playwright settings (optional)
PLAYWRIGHT_HEADLESS=false # Show browser for monitoringTo use GLM models instead of Claude:
ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
ANTHROPIC_AUTH_TOKEN=your-zhipu-api-key
API_TIMEOUT_MS=3000000
ANTHROPIC_DEFAULT_SONNET_MODEL=glm-4.7
ANTHROPIC_DEFAULT_OPUS_MODEL=glm-4.7
ANTHROPIC_DEFAULT_HAIKU_MODEL=glm-4.5-airGet API key: https://z.ai/subscribe
Note: This only affects AutoCoder. Your global Claude Code settings remain unchanged.
Projects can be stored anywhere on disk. The registry maps names to paths:
# Registry location (cross-platform)
~/.autocoder/registry.db
# View registered projects
python start.py # Shows list in CLI menuRegistry uses SQLite with POSIX paths for cross-platform compatibility.
cd ui
npm install
npm run dev # Hot reload at http://localhost:5173cd ui
npm run build # Builds to ui/dist/Note: start_ui.bat/start_ui.sh serve the pre-built UI. Run npm run build after UI changes.
- React 18 with TypeScript
- TanStack Query for data fetching
- Tailwind CSS v4 with custom theme (
@themedirective) - Radix UI components (accessible by default)
- dagre for dependency graph layout
- WebSocket for real-time updates
WebSocket endpoint: /ws/projects/{project_name}
Message Types:
type WSMessage =
| { type: "progress"; data: { passing: number; total: number } }
| { type: "agent_status"; data: "running" | "paused" | "stopped" | "crashed" }
| { type: "log"; data: string; featureId?: number; agentIndex?: number }
| { type: "feature_update"; data: Feature }
| { type: "agent_update"; data: AgentState[] }Custom design system with bold borders and vibrant colors:
/* globals.css - @theme directive */
@theme {
--color-neo-pending: #fbbf24; /* yellow-400 */
--color-neo-progress: #06b6d4; /* cyan-500 */
--color-neo-done: #10b981; /* green-500 */
--animate-slide-in: slide-in 0.3s ease-out;
--animate-pulse-neo: pulse-neo 2s ease-in-out infinite;
}# Install test dependencies
pip install pytest pytest-asyncio pytest-cov
# Run all tests
python -m pytest tests/ -v
# Run with coverage
python -m pytest tests/ --cov=api --cov=parallel_orchestrator --cov=paths --cov-report=term-missing
# Run specific test files
python -m pytest tests/test_human_input_system.py -v
python -m pytest tests/test_pause_drain_mode.py -vCurrent coverage: 31% overall (81% for new features)
api/database.py: 81% - Human input system, migrationspaths.py: 82% - Pause/drain infrastructureparallel_orchestrator.py: 14% - Core logic tested (full integration pending)
Comprehensive user acceptance testing guide:
cat UAT_HUMAN_INPUT_AND_PAUSE.mdIncludes:
- 5 complete test scenarios (human mode + supervisory agent)
- Configuration instructions
- Test data and expected results
- Success criteria (18 checkpoints)
- Troubleshooting guide
- API reference with examples
Building complete applications takes time!
| Phase | Duration | Notes |
|---|---|---|
| First session (initialization) | 10-20+ minutes | Generates feature test cases (appears to hang - normal) |
| Single feature | 5-15 minutes | Depends on complexity |
| Full application | Many hours | Across multiple sessions |
| Checkpoint review | 2-5 minutes | Per checkpoint (every N features) |
| Design iteration | 5-10 minutes | Per persona review |
Optimization Tips:
- Reduce feature count in spec for faster demos (20-50 features)
- Use YOLO mode to skip regression testing (
--yolo) - Increase parallel agents for independent features (
--max-concurrency 5) - Enable supervisory agent to eliminate human input waits
# 1. Create project with human input gates
python start.py
> Create new project
> Name: oauth-demo
> Use /create-spec to define app
# 2. Start agent (will pause for human input)
python autonomous_agent_demo.py --project-dir oauth-demo
# 3. Agent requests OAuth credentials
# Provide via UI: http://localhost:5173
# 4. Agent continues with credentials
# Checkpoint runs every 10 features
# Human reviews checkpoint report before continuing# 1. Enable supervisory agent
export ENABLE_SUPERVISORY_AGENT=true
# 2. Start with YOLO mode + parallel execution
python autonomous_agent_demo.py \
--project-dir oauth-demo \
--yolo \
--parallel \
--max-concurrency 3 \
--supervisory-agent
# 3. Fully automated (no human intervention)
# - Supervisory agent provides mock credentials
# - YOLO mode skips regression tests
# - Parallel execution maximizes speed
# - Checkpoints still run (can auto-continue)# 1. Standard mode with checkpoints
python autonomous_agent_demo.py --project-dir my-app
# 2. Checkpoint review every 10 features
# Reports in: checkpoints/checkpoint_N.md
# 3. Design iteration after major milestones
python design/review.py --project-dir my-app --iteration 1
# 4. Metrics dashboard
python metrics/dashboard.py --project-dir my-app
# 5. Final performance report
python metrics/report_generator.py --project-dir my-app"Claude CLI not found"
# Install Claude CLI
curl -fsSL https://claude.ai/install.sh | bash # macOS/Linux
irm https://claude.ai/install.ps1 | iex # Windows"Not authenticated with Claude"
claude login"Appears to hang on first run"
- Normal! Initializer generates detailed test cases (10-20 minutes)
- Watch for
[Tool: ...]output to confirm agent is working - Check
agent_output.logfor progress
"Command blocked by security hook"
- Agent tried to run command not in allowlist
- Security system working as intended
- Add to
ALLOWED_COMMANDSinsecurity.pyif needed
curl -X POST http://localhost:8000/api/projects/my-app/features/1/human-input
-H "Content-Type: application/json"
-d '{"field_id": "value"}'
"Pause not working - agents still running"
# Check pause signal file
ls -la <project-dir>/.autocoder/.pause_drain
# Check orchestrator logs
tail -f agent_output.log | grep -i drain
# Verify process manager
curl http://localhost:8000/api/projects/my-app/agent/status"Playwright CLI commands failing"
# Install Playwright browsers
npx playwright install
# Check playwright-cli is in PATH
which playwright-cli # macOS/Linux
where playwright-cli # Windows
# Verify in security allowlist
grep playwright-cli security.py"Database locked errors"
- Multiple processes accessing
features.db - Stop agent before running manual queries
- Check for stale
.agent.lockfile
"UI not showing latest features"
# Rebuild UI
cd ui && npm run build
# Or use dev mode
cd ui && npm run dev# Enable debug output
export DEBUG=1
# Check logs
tail -f agent_output.log
tail -f .autocoder/debug.log
# WebSocket debugging (browser console)
# Enable: localStorage.debug = 'socket.io-client:*'- Check
docs/TROUBLESHOOTING.md - Review test files in
tests/for examples - Read UAT script:
UAT_HUMAN_INPUT_AND_PAUSE.md - Check GitHub issues
- Review prompt templates in
.claude/templates/
Comprehensive docs in docs/:
- DEVELOPER_GUIDE.md - Contributing guide
- TROUBLESHOOTING.md - Common issues & fixes
- SKIP_MANAGEMENT_USER_GUIDE.md - Skip management
- UAT_HUMAN_INPUT_AND_PAUSE.md - UAT for new features
- PRD_TO_IMPLEMENTATION_MAPPING.md - Feature tracking
- β Human input system with supervisory agent mode
- β Pause/drain mode for graceful shutdown
- β Playwright CLI integration (simpler than MCP)
- β Checkpoint system with auto-fix
- β Design iteration with multi-persona review
- β Parallel execution with dependency resolution
- β Metrics dashboard and ROI analysis
- β Skip management with blocker classification
- β React UI with real-time WebSocket updates
- π UI for human input requests (modal/form)
- π UI for checkpoint review and approval
- π Design iteration UI with persona feedback
- π Enhanced supervisory agent with learning
- π Multi-project batch processing
- π Cloud deployment (Docker + Kubernetes)
- π Team collaboration features
- π Advanced scheduling (time-based runs)
This project is licensed under the GNU Affero General Public License v3.0.
See LICENSE.md for full details.
Copyright (C) 2026 Leon van Zyl https://leonvanzyl.com
This project builds upon the original autonomous coding agent by Leon van Zyl, enhanced with enterprise-grade features including:
- Human-in-the-loop controls (human input gates + supervisory agents)
- Graceful pause/drain mode
- Checkpoint system for automated quality review
- Design iteration workflow with multi-persona review
- Parallel execution with dependency-aware scheduling
- Comprehensive metrics and performance tracking
- Playwright CLI integration (replacing MCP server)
- Production-ready security and testing
Original Project: https://github.com/leonvanzyl/autonomous-coding Created by: Leon van Zyl (https://leonvanzyl.com)
Enhanced Fork: https://github.com/ArchitectVS7/autocoder-2
Special thanks to the open-source community and the Anthropic team for the Claude Agent SDK.
Built with β€οΈ for developers who ship production-quality code faster.
