A beautiful, interactive web application for browsing AI research papers from arXiv and chatting with them using AI-powered PDF chatbots.
Run everything on your own machine with complete privacy. Uses Ollama for local LLM inference and RAG for accurate answers.
- File:
index.html- Local version with Python backend - Privacy: All processing happens locally - no external API calls
- Cost: Completely free - no API costs
- Models: Use any Ollama model (llama3, mistral, etc.)
Uses HuggingFace Spaces for chatbot functionality (requires internet).
- File:
index-cloud.html- Cloud version - Convenience: No setup required, just open in browser
- Dependency: Requires HuggingFace services to be online
- Complete Privacy: All data stays on your machine
- RAG-Powered: Advanced Retrieval-Augmented Generation for accurate answers
- Vector Embeddings: Uses ChromaDB for semantic search
- Source Citations: Shows relevant paper sections used for each answer
- Conversation History: Maintains context across questions
- PDF Processing: Automatic text extraction and intelligent chunking
- Browse Research Papers: Curated collection of cutting-edge AI research papers organized by category
- LLM Reasoning & Training
- AI Safety & Alignment
- Vision-Language Models
- Select Papers: Choose from various influential papers with detailed descriptions
- Download PDFs: Direct links to arXiv papers
- Integrated Experience: Seamlessly download papers and chat with them
Prerequisites:
- Python 3.8 or higher
- Ollama installed on your system
Step 1: Install Ollama
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or download from https://ollama.comStep 2: Pull an LLM Model
# Recommended: Llama 3 (4.7GB)
ollama pull llama3
# Alternatives:
# ollama pull mistral # Smaller, faster
# ollama pull llama2 # Older but reliable
# ollama pull codellama # Good for technical papersStep 3: Install Python Dependencies
cd backend
pip install -r requirements.txtStep 4: Start the Backend Server
cd backend
python server.pyThe server will start on http://localhost:5000
Step 5: Open the Frontend
# In a new terminal, from the project root
python -m http.server 8000Then open http://localhost:8000 in your browser.
No setup required! Simply open index-cloud.html in your browser.
- Start Backend: Make sure the Python backend is running (
python backend/server.py) - Open Frontend: Navigate to
http://localhost:8000in your browser - Browse Papers: Explore the curated collection of AI research papers
- Download PDF: Click "Download PDF" on any paper that interests you
- Open Chat: Click "Open Chat Interface"
- Upload PDF: Click "Upload PDF" and select the downloaded paper
- Ask Questions: Start chatting! The AI will answer based on the paper content
- View Sources: Expand source sections to see which parts of the paper were used
- React 18: Modern UI framework
- Tailwind CSS: Beautiful, responsive styling
- Babel Standalone: In-browser JSX compilation
- Flask: Lightweight Python web framework
- Ollama: Local LLM inference
- LangChain: LLM application framework
- ChromaDB: Vector database for embeddings
- Sentence Transformers: Text embedding generation
- PyPDF2: PDF text extraction
- LightReasoner - Training large models using small models
- BoostStep - Mathematical reasoning enhancement
- Impact of Reasoning Step Length
- Divide and Conquer strategies
- Reverse Curriculum RL
- Comprehensive survey of alignment techniques
- Constitutional AI and DPO methods
- AlignVLM - Bridging vision and language
- DeepSeek-VL2 - Mixture-of-Experts approach
- Vision-Language-Action Models
- Chart Understanding with MMC
The Flask backend provides the following REST API endpoints:
GET /api/health- Health checkGET /api/status- Get system status (PDF loaded, chunk count)POST /api/upload- Upload and process a PDF filePOST /api/chat- Send a message and get AI responsePOST /api/clear- Clear conversation historyPOST /api/reset- Reset system (clear all data)
┌─────────────────┐
│ Frontend │ React SPA
│ (index.html) │ - Browse papers
│ │ - Upload PDF
└────────┬────────┘ - Chat interface
│
│ HTTP/REST
▼
┌─────────────────┐
│ Flask API │ Python Backend
│ (server.py) │ - PDF upload
└────────┬────────┘ - Request routing
│
┌────┴────┐
▼ ▼
┌─────────┐ ┌──────────┐
│ PDF │ │ RAG │
│Processor│ │ Engine │
└─────────┘ └────┬─────┘
│
┌────────┴────────┐
▼ ▼
┌─────────┐ ┌─────────┐
│ChromaDB │ │ Ollama │
│Vectors │ │ LLM │
└─────────┘ └─────────┘
Edit backend/rag_engine.py and change the model name:
rag_engine = RAGEngine(
model_name="mistral", # Change this
embedding_model="all-MiniLM-L6-v2"
)Edit backend/pdf_processor.py:
processor = PDFProcessor(
chunk_size=1500, # Increase for more context
chunk_overlap=300 # Increase for better continuity
)Edit backend/server.py in the chat endpoint:
response = rag_engine.chat(
query=message,
conversation_history=conversations[session_id],
n_context=5 # Retrieve more chunks
)- Make sure all dependencies are installed:
pip install -r backend/requirements.txt - Check if port 5000 is available:
lsof -i :5000 - Ensure Ollama is running:
ollama list
- Start Ollama service:
ollama serve - Pull the model:
ollama pull llama3 - Check Ollama is running:
curl http://localhost:11434
- Try a smaller model:
ollama pull mistral - Reduce context chunks in
server.py(changen_contextto 2) - Consider using GPU acceleration for Ollama
- Check file size (max 50MB)
- Ensure PDF is not password protected
- Try a different PDF
- GPU Acceleration: Ollama automatically uses GPU if available (NVIDIA, Apple Silicon)
- Model Selection: Smaller models (mistral) are faster, larger models (llama3) are more accurate
- Chunk Tuning: Balance between chunk_size (more context) and response time
- Embeddings: The
all-MiniLM-L6-v2model is fast and lightweight. For better quality, tryall-mpnet-base-v2
Deploy the static HTML file anywhere:
- GitHub Pages: Just commit and enable GitHub Pages
- Netlify: Drag and drop
index-cloud.html - Vercel: Deploy with zero configuration
For production deployment:
- Use Gunicorn instead of Flask dev server
- Add nginx as reverse proxy
- Consider Docker for easy deployment
- Set up systemd service for auto-start
This project is open source and available for educational purposes.
- Local AI powered by Ollama
- Cloud chatbots powered by HuggingFace Spaces
- Papers sourced from arXiv
- Icons inspired by Lucide Icons
- RAG implementation using LangChain and ChromaDB