5th Member — Local RAG AI Chat Engine

5th Member is a self-hosted conversational AI stack that remembers over time using Ollama, Qdrant, and FastAPI.
It acts as your team’s quiet 5th member — listening, learning, and responding with context-aware intelligence.

Features

Conversational Memory: Stores and recalls chat history through Qdrant vector search
RAG Integration: Retrieves past context to enrich current prompts automatically
FastAPI Backend: Lightweight async API server ready for local or containerized deployment
Ollama Integration: Uses locally-running Ollama models for offline or private inference
Docker Support: Works seamlessly with local Dockerized Qdrant setup
Progressive Summarization: Keeps your long-term memory concise and relevant

Tech Stack

Component	Purpose
FastAPI	API framework for async chat & RAG endpoints
Ollama	Local LLM runner (e.g., llama3, mistral, codellama)
Qdrant	Vector database to store and retrieve conversation embeddings
httpx	Async HTTP client for streaming LLM responses
Python 3.11+	Core runtime

Setup Guide

1️⃣ Clone the Repository

git clone https://github.com/utsav-develops/5thMember.git
cd 5thMember

2️⃣ Create & Activate Virtual Environment

python -m venv .venv
# Activate:
source .venv/bin/activate      # macOS / Linux
# or
.venv\Scripts\activate         # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

🧩 Environment Configuration

Create a .env file in your project root directory:

# Ollama Settings
OLLAMA_URL=http://localhost:11434
OLLAMA_TEXT_MODEL=llama3
OLLAMA_EMBED_MODEL=embedding-model

# Qdrant Settings
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=memories

🐳 Running Services

🧠 Run Qdrant via Docker

docker run -p 6333:6333 -v qdrant_storage:/qdrant/storage qdrant/qdrant

⚙️ Run Ollama (Local LLM)

ollama pull llama3
ollama serve

💡 You can also use other models like mistral, codellama, or phi3 by updating the .env file.

🚀 Launch the FastAPI Server

uvicorn main:app --reload --port 8000

Server runs at: http://localhost:8000

💡 Usage Examples

🔹 Basic Chat Endpoint

Send messages via /chat:

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Explain quantum entanglement"}'

🔹 RAG Chat Endpoint (with memory)

Uses Qdrant-stored context to enhance replies:

curl -X POST http://localhost:8000/rag-chat \
  -H "Content-Type: application/json" \
  -d '{"user_id": "user123", "prompt": "Remind me what we discussed earlier about ML agents"}'

🧠 How RAG Memory Works

Every message is embedded and stored in Qdrant. When a new message arrives:

Qdrant searches for similar past messages by vector similarity
Relevant context is retrieved
A new, context-rich prompt is constructed
Ollama generates a response with awareness of previous conversations
Older memories are periodically summarized and compressed

📂 Project Structure

5thMember/
│
├── main.py           # FastAPI app and API endpoints
├── utils.py          # RAG logic, summarization, Qdrant operations
├── db.py             # Database (Qdrant) user management
├── requirements.txt  # Python dependencies
├── .env              # Environment variables
└── README.md         # Documentation (you’re reading this!)

🧰 Example Models

You can use any Ollama-supported model:

ollama pull mistral
ollama pull phi3
ollama pull codellama

Then, update .env:

OLLAMA_TEXT_MODEL=phi3

🧑‍💻 Development Tips

Use --reload flag in Uvicorn for hot-reloading during development
Check your Qdrant dashboard at http://localhost:6333/dashboard
Logs can be viewed via Docker:
```
docker logs <container_id>
```

🧾 Example `.env` Template

# LLM Settings
OLLAMA_URL=http://localhost:11434
OLLAMA_TEXT_MODEL=llama3
OLLAMA_EMBED_MODEL=embedding-model

# Vector DB
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=memories

📜 License

“The best teammate never sleeps — it just keeps learning.”

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
frontend/5th_Member		frontend/5th_Member
node_modules		node_modules
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
auth.py		auth.py
db.py		db.py
docker-compose.yml		docker-compose.yml
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

5th Member — Local RAG AI Chat Engine

Features

Tech Stack

Setup Guide

1️⃣ Clone the Repository

2️⃣ Create & Activate Virtual Environment

3️⃣ Install Dependencies

🧩 Environment Configuration

🐳 Running Services

🧠 Run Qdrant via Docker

⚙️ Run Ollama (Local LLM)

🚀 Launch the FastAPI Server

💡 Usage Examples

🔹 Basic Chat Endpoint

🔹 RAG Chat Endpoint (with memory)

🧠 How RAG Memory Works

📂 Project Structure

🧰 Example Models

🧑‍💻 Development Tips

🧾 Example `.env` Template

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

5th Member — Local RAG AI Chat Engine

Features

Tech Stack

Setup Guide

1️⃣ Clone the Repository

2️⃣ Create & Activate Virtual Environment

3️⃣ Install Dependencies

🧩 Environment Configuration

🐳 Running Services

🧠 Run Qdrant via Docker

⚙️ Run Ollama (Local LLM)

🚀 Launch the FastAPI Server

💡 Usage Examples

🔹 Basic Chat Endpoint

🔹 RAG Chat Endpoint (with memory)

🧠 How RAG Memory Works

📂 Project Structure

🧰 Example Models

🧑‍💻 Development Tips

🧾 Example .env Template

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🧾 Example `.env` Template

Packages