5th Member is a self-hosted conversational AI stack that remembers over time using Ollama, Qdrant, and FastAPI.
It acts as your team’s quiet 5th member — listening, learning, and responding with context-aware intelligence.
- Conversational Memory: Stores and recalls chat history through Qdrant vector search
- RAG Integration: Retrieves past context to enrich current prompts automatically
- FastAPI Backend: Lightweight async API server ready for local or containerized deployment
- Ollama Integration: Uses locally-running Ollama models for offline or private inference
- Docker Support: Works seamlessly with local Dockerized Qdrant setup
- Progressive Summarization: Keeps your long-term memory concise and relevant
| Component | Purpose |
|---|---|
| FastAPI | API framework for async chat & RAG endpoints |
| Ollama | Local LLM runner (e.g., llama3, mistral, codellama) |
| Qdrant | Vector database to store and retrieve conversation embeddings |
| httpx | Async HTTP client for streaming LLM responses |
| Python 3.11+ | Core runtime |
git clone https://github.com/utsav-develops/5thMember.git
cd 5thMemberpython -m venv .venv
# Activate:
source .venv/bin/activate # macOS / Linux
# or
.venv\Scripts\activate # Windowspip install -r requirements.txtCreate a .env file in your project root directory:
# Ollama Settings
OLLAMA_URL=http://localhost:11434
OLLAMA_TEXT_MODEL=llama3
OLLAMA_EMBED_MODEL=embedding-model
# Qdrant Settings
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=memoriesdocker run -p 6333:6333 -v qdrant_storage:/qdrant/storage qdrant/qdrantollama pull llama3
ollama serve💡 You can also use other models like
mistral,codellama, orphi3by updating the.envfile.
uvicorn main:app --reload --port 8000Server runs at: http://localhost:8000
Send messages via /chat:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain quantum entanglement"}'Uses Qdrant-stored context to enhance replies:
curl -X POST http://localhost:8000/rag-chat \
-H "Content-Type: application/json" \
-d '{"user_id": "user123", "prompt": "Remind me what we discussed earlier about ML agents"}'Every message is embedded and stored in Qdrant. When a new message arrives:
- Qdrant searches for similar past messages by vector similarity
- Relevant context is retrieved
- A new, context-rich prompt is constructed
- Ollama generates a response with awareness of previous conversations
- Older memories are periodically summarized and compressed
5thMember/
│
├── main.py # FastAPI app and API endpoints
├── utils.py # RAG logic, summarization, Qdrant operations
├── db.py # Database (Qdrant) user management
├── requirements.txt # Python dependencies
├── .env # Environment variables
└── README.md # Documentation (you’re reading this!)
You can use any Ollama-supported model:
ollama pull mistral
ollama pull phi3
ollama pull codellamaThen, update .env:
OLLAMA_TEXT_MODEL=phi3-
Use
--reloadflag in Uvicorn for hot-reloading during development -
Check your Qdrant dashboard at http://localhost:6333/dashboard
-
Logs can be viewed via Docker:
docker logs <container_id>
# LLM Settings
OLLAMA_URL=http://localhost:11434
OLLAMA_TEXT_MODEL=llama3
OLLAMA_EMBED_MODEL=embedding-model
# Vector DB
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=memoriesMIT License © 2025 — Created by Utsav Acharya
“The best teammate never sleeps — it just keeps learning.”