A LangGraph-based system that coordinates multiple AI agents for image processing tasks using the Agent-Supervisor pattern.
This project implements a multi-agent system based on LangGraph's Agent-Supervisor pattern, where a supervisor agent coordinates multiple specialized image processing agents. The system demonstrates the use of LangGraph's edgeless graph architecture and Command construct for agent coordination.
The system consists of:
-
Supervisor Agent
- Coordinates the workflow
- Makes intelligent decisions about task sequencing
- Routes requests to appropriate agents using LangGraph's Command construct
-
Task Agents
- Image Generation Agent: Handles image creation requests
- Text Overlay Agent: Adds text to images
- Background Removal Agent: Removes image backgrounds
The graph visualization above shows:
- The initial entry point (START) connecting to the Supervisor
- The Supervisor node which coordinates all task agents
- Task agent nodes for specific image processing operations
- The potential paths through the system based on user requests
-
Edgeless Graph Architecture
- Instead of explicit edges between nodes, routing is handled by agent Commands
- Each agent returns a Command that specifies the next agent to run
- Simplifies graph structure and makes it more flexible
-
Command Construct
Command( goto="next_agent", update={ "next_agent": "next_agent", "current_task": "current_task", "messages": [...], } )
goto: Specifies the next agent to executeupdate: Updates the state that's passed between agents
-
StateGraph
builder = StateGraph(AgentState) builder.add_node("supervisor", create_supervisor_agent()) builder.add_edge(START, "supervisor")
- Manages state transitions between agents
- Only requires initial edge from START to supervisor
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Create a
.envfile with your OpenAI API key:
OPENAI_API_KEY=your-key-here
From the project root directory, run:
python -m src.mainExample inputs to try:
- "Generate an image of a sunset and add text 'Beautiful Evening' to it"
- "Create an image of a mountain landscape and remove its background"
- "Generate an image of a cat with 'Hello' text"
The system will:
- Take your input
- Use the supervisor to determine the sequence of tasks
- Route the request through appropriate agents
- Show the execution path and final result
The system includes an evaluation framework to assess the performance and correctness of the multi-agent workflow.
For detailed information about the evaluation framework, see Evaluation Documentation.
image_processing_agents/
├── src/
│ ├── agents/
│ │ ├── supervisor.py # Supervisor agent implementation
│ │ ├── image_generation.py
│ │ ├── text_overlay.py
│ │ └── background_removal.py
│ ├── evaluation/ # Evaluation framework
│ │ ├── evaluators.py # Evaluation functions
│ │ ├── create_dataset.py # Test dataset creation
│ │ └── run_evaluation.py # Main evaluation script
│ ├── agent_types/
│ │ └── state.py # State type definitions
│ ├── config/
│ │ └── settings.py # Configuration settings
│ └── main.py # Main execution script
├── .env # Environment variables
├── .gitignore
└── requirements.txt
-
State Management
- Uses TypedDict for type-safe state management
- Tracks messages, current task, and image URLs
- Maintains execution history
-
Agent Communication
- Agents communicate through state updates
- Each agent adds its actions to the message history
- Supervisor makes decisions based on complete context
-
Routing Logic
- Supervisor analyzes both original request and current state
- Makes sequential decisions about task execution
- Uses LLM to understand complex requests
This implementation follows the LangGraph Agent-Supervisor tutorial: LangGraph Multi-Agent Tutorial
-
Test Dataset
- Predefined test cases with expected outcomes
- Stored in LangSmith for tracking and analysis
-
LLM Judge (GPT-4)
- Evaluates task completion accuracy
- Analyzes agent execution patterns
- Provides detailed reasoning for scores
-
Metrics
- Task Completion Score (0.0 - 1.0)
- Node Execution Score (0.0 - 1.0)
- Execution Time
-
Results Storage
- Evaluation results are stored in LangSmith
- Detailed logs of agent interactions
- Performance metrics and analysis
