InsightAgent AI is an agentic RAG application built with LangGraph and FastAPI. It routes user queries through planner, retrieval, reasoning, and action nodes, with:
Build an intelligent, agent-based system that can understand user queries, retrieve relevant information, reason over it, and take appropriate actions to support real-time decision-making.
This project demonstrates how to combine large language models with structured workflows, memory systems, and external data sources to deliver scalable, production-ready AI assistants.
Modern organizations work with large volumes of unstructured data, including documents, reports, logs, and knowledge bases. This makes it difficult to:
- Retrieve relevant information quickly
- Preserve context across interactions
- Make informed decisions in real time
- Integrate AI capabilities into existing tools and workflows
Traditional chatbot systems are often not enough because they typically:
- Lack reliable memory
- Do not reason effectively across steps
- Cannot execute meaningful actions
- Return generic, low-context responses
InsightAgent AI addresses these challenges by:
- Using RAG (Retrieval-Augmented Generation) to ground responses in real data
- Implementing an agentic LangGraph workflow to dynamically route tasks (
retrieve,reason,act) - Maintaining short-term and long-term memory for contextual continuity
- Supporting tool and action execution for extensibility
- Exposing a scalable FastAPI layer for real-time applications
- Long-term vector memory in Qdrant
- Short-term session memory in Redis (with in-memory fallback)
- OpenAI chat model responses
- Accepts questions via API (
POST /query) - Uses a planner to route between:
retrieve-> semantic search in Qdrantreason-> LLM answer generationact-> external web search tool path
- Stores session context and long-term knowledge
- Python 3.10+
- FastAPI + Uvicorn
- Streamlit
- LangGraph + LangChain
- Qdrant (
qdrant-client) - Redis (optional)
- OpenAI (
langchain-openai) - Optional web search providers: Google CSE, SerpAPI, Tavily
streamlit_app.py # Streamlit user interface
src/
main.py # CLI smoke-run entrypoint
config.py # Env + YAML config loader
api/api_server.py # FastAPI endpoints
agents/
rag_graph.py # LangGraph workflow
planner_node.py # Routing logic
retriever.py # Qdrant retrieval
reasoner.py # LLM reasoning
ingestion.py # Compatibility exports for ingestion helpers
memory_manager.py # Session + long-term memory
ingestion/
loaders.py # File loader handlers (PDF/Markdown/Text/JSON)
service.py # Chunking + vector-store ingestion service
errors.py # Typed ingestion exceptions
utils/
qdrant_store.py # Shared Qdrant client/store helper
logger.py # Log configuration
dependencies.py # Shared app/service factories
- Create and activate a virtual environment.
- Install dependencies:
pip install -r requirements.txt- Configure
.env:
OPENAI_API_KEY=your_openai_key
LANGSMITH_API_KEY=optional
REDIS_HOST=localhost
REDIS_PORT=6379
# optional endpoint auth:
# API_AUTH_TOKEN=your_token
# optional Streamlit UI settings:
# STREAMLIT_API_BASE_URL=http://127.0.0.1:8000
# STREAMLIT_API_KEY=your_token
# optional search provider credentials:
# GOOGLE_SEARCH_API_KEY=your-google-api-key
# GOOGLE_SEARCH_ENGINE_ID=your-google-cse-id
# SERPAPI_API_KEY=your-serpapi-key
# TAVILY_API_KEY=your-tavily-keyOptional LangSmith tracing variables:
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_PROJECT=your_langsmith_project_nameMain runtime config is in config.yaml.
Important sections:
llm.model_name(current default:gpt-5.2)qdrant(local/server mode and collection)retriever.top_kapi.rate_limit_per_minuteloggingsearch.enabledsearch.provideringestion.chunk_size(optional)ingestion.chunk_overlap(optional)
Default is local file-backed mode:
qdrant:
mode: "local"
path: "./qdrant_db"
collection_name: "knowledge"No external Qdrant service is required in this mode.
Supported providers:
google_cseserpapi_googletavily
Example:
search:
enabled: true
provider: "google_cse"
max_results: 5Search-like queries such as search the web for ..., latest ..., or google ... route to the action node and use the configured provider.
python -m src.mainuvicorn src.api.api_server:app --host 127.0.0.1 --port 8000 --reloadRun the backend first, then start the UI in a second terminal:
streamlit run streamlit_app.pyUI URL:
http://127.0.0.1:8501
Open backend docs:
http://127.0.0.1:8000/docs
GET /health-> service and memory backend statusPOST /query-> primary user response endpointGET /memory?session_id=...-> inspect short-term memory for a session
The Streamlit UI uses these same endpoints over HTTP.
Example request:
curl -X POST http://127.0.0.1:8000/query \
-H "Content-Type: application/json" \
-d '{"query":"What is semantic search?","session_id":"user1"}'Supported file loaders:
- PDF (
.pdf) - Markdown (
.md,.markdown) - Text (
.txt,.text) - JSON (
.json)
Example usage:
from src.ingestion.service import ingest_directory, ingest_file
ingest_file("docs/architecture.md")
ingest_directory("docs/", recursive=True)Run tests:
python -m pytest -qQuick syntax check:
python -m compileall src tests- If Redis is not running, the app falls back to in-memory short-term storage.
- If
API_AUTH_TOKENis set, all endpoints requireX-API-Key. - The app requires
OPENAI_API_KEYat runtime.