Second Brain CRAG (Corrective RAG)

A personal “second brain” that lets you query your Markdown notes with a Corrective RAG (CRAG) pipeline: multi-query retrieval, relevance grading, optional web search, cross-encoder reranking, and a grounded final answer with sources.

🚀 What this does

Ingest notes once into a local vector index (FAISS)
Ask questions via a LangGraph pipeline:
1. Multi-query retrieval (LangChain MultiQueryRetriever)
2. Grade/filter irrelevant chunks
3. If nothing relevant → web search + scrape and append
4. Cross-encoder rerank → keep top-k
5. LLM generates answer grounded in retrieved context, with sources
Frontend with Streamlit

📁 Project Structure

second-brain/
│
├── app.py              # Streamlit UI
├── rag_graph.py        # LangGraph CRAG pipeline
├── retriever.py        # FAISS loader, retriever, reranker, generator
├── grader.py           # LLM relevance grader (yes/no)
├── ingest.py           # Build FAISS index from Markdown notes
├── web_search.py       # Web search + scraping helpers
│
├── data/
│   └── notes/          # Your Markdown notes
│
└── vectorstore/        # FAISS index (local only)

⚙️ Setup

1) Create & activate virtual environment (Windows PowerShell)

python -m venv venv
.\venv\Scripts\Activate.ps1

2) Install dependencies

pip install -U pip
pip install streamlit langgraph langchain langchain-community langchain-openai langchain-huggingface sentence-transformers faiss-cpu python-dotenv

3) Add environment variables

Create a .env file in the root:

OPENROUTER_API_KEY=your_key_here

📥 Ingest your notes

Place your Markdown notes inside:

second-brain/data/notes/

Then run:

python ingest.py

This builds/updates the FAISS index in:

second-brain/vectorstore/

🧠 Run the CRAG pipeline (CLI)

python rag_graph.py

🌐 Run the Streamlit UI

streamlit run app.py

If you encounter:

ModuleNotFoundError: No module named 'torchvision'

Run Streamlit with file-watcher disabled:

streamlit run app.py --server.fileWatcherType none

🔒 Notes on Safety

vectorstore/index.pkl uses pickle
Loading FAISS with allow_dangerous_deserialization=True can be unsafe
Treat vectorstore/ as local-only
Do NOT commit model/index artifacts

🚫 .gitignore (Recommended)

venv/
__pycache__/
*.pyc
second-brain/vectorstore/

(Add any private note folders if needed)

📄 License

Apache-2.0

📝 One-liner Description

A personal Second Brain built with a Corrective RAG (CRAG) pipeline: multi-query retrieval, grading, optional web search, cross-encoder reranking, and a Streamlit UI for querying Markdown notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Second Brain CRAG (Corrective RAG)

🚀 What this does

📁 Project Structure

⚙️ Setup

1) Create & activate virtual environment (Windows PowerShell)

2) Install dependencies

3) Add environment variables

📥 Ingest your notes

🧠 Run the CRAG pipeline (CLI)

🌐 Run the Streamlit UI

🔒 Notes on Safety

🚫 .gitignore (Recommended)

📄 License

📝 One-liner Description

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
grader.py		grader.py
ingest.py		ingest.py
query.py		query.py
rag_graph.py		rag_graph.py
retriever.py		retriever.py
web_search.py		web_search.py
without_retriever.py		without_retriever.py

Folders and files

Latest commit

History

Repository files navigation

Second Brain CRAG (Corrective RAG)

🚀 What this does

📁 Project Structure

⚙️ Setup

1) Create & activate virtual environment (Windows PowerShell)

2) Install dependencies

3) Add environment variables

📥 Ingest your notes

🧠 Run the CRAG pipeline (CLI)

🌐 Run the Streamlit UI

🔒 Notes on Safety

🚫 .gitignore (Recommended)

📄 License

📝 One-liner Description

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages