Python, FastAPI, Uvicorn, LangChain, LangGraph, Google Gen AI, React, TypeScript, Vite
Sandbox project built to experiment with multimodal AI pipelines and stateful orchestration.
The goal is to take a user-uploaded portrait, analyze it, and dynamically transform it using a graph-based agent workflow
- LangGraph & LangChain: Handles the agent workflow, state management, and routing logic between execution nodes
- Google Gen AI (
gemini-2.5-flash-image): Used both as a vision tool to analyze the image and as a generative engine for the final image-to-image modification. - LangSmith: Hooked up for quick debugging, viewing execution traces, and tracking agent states in real-time.
- Analyze: The system inspects the input image to detect visible features (skin, jawline, hair)
- Route: It identifies the subject's gender to branch the state into a specific processing path
- Enhance: It builds a tailored prompt based on the analysis and fires it at the Gemini vision model with customized safety overrides to generate a high-end, edited version of the portrait
image_cache: Uploads the image and creates a 5-minute Gemini Context Cache if the file exceeds 1024 tokens to save bandwidth and API costsgender_classification: A zero-temperature vision gatekeeper that detects if a human is present and returns their gender (male,female, ornone) in raw JSONfeature_analysis: Pydantic validation schemas dynamically based on gender to detect visible face/body parts (eyes, jawline, skin, hair)image_retouch_specifier: Conditional prompt builder. It maps the detected features into pro-grade retouching instructionsexecute_image_enhancement: The execution node. Fires the final prompt atgemini-2.5-flash-image, bypasses false-positive safety locks, and saves the output image
The entire workflow relies on a single, stateful AgentState object managed by LangGraph. Each node in the graph reads from this shared context and returns only the fields it intends to update
[Input State]
│
▼
(image_cache)
│
▼
(gender_classification)
│
▼
(feature_analysis)
│
▼
(image_retouch_specifier)
│
▼
(execute_image_enhancement)
│
▼
[Output State]
GOOGLE_API_KEY=
LANGCHAIN_TRACING_V2=true
LANGSMITH_ENDPOINT=https://eu.api.smith.langchain.com
LANGCHAIN_API_KEY=lsv2_pt_
LANGSMITH_PROJECT="vision-chain"
cd backend && uvicorn main:app --reload --reload-dir .
cd frontend && npm run dev
vision-Chain/
├── backend/
│ ├── routers/
│ │ ├── agents/
│ │ │ ├── graph.py
│ │ │ ├── nodes.py
│ │ │ ├── router.py
│ │ │ ├── schemas.py
│ │ │ └── state.py
│ │ └── upload.py
│ └── main.py
├── frontend/
│ ├── src/
│ │ ├── App.tsx
│ │ ├── index.css
│ │ └── main.tsx
│ ├── index.html
│ ├── package.json
│ ├── tsconfig.json
│ └── vite.config.ts
├── static/
├── storage/
├── uploads/
├── .env
└── .gitignore
This project is licensed under the MIT License - feel free to do whatever you want with the code


