Real-time AI-powered meeting transcription and smart assistance platform
MindSync 2.0 is an intelligent meeting assistant that provides real-time transcription, smart suggestions, and comprehensive meeting management with voice cloning capabilities.
- Real-time Transcription: Live audio-to-text using VOSK and Whisper AI
- AI Assistant Mode: Intelligent suggestions and responses during meetings
- Voice Cloning: Personalized TTS with voice synthesis
- Meeting Management: Complete CRUD operations for meeting records
- Chat Interface: Interactive AI-powered conversation
- Multi-format Audio: Support for various audio formats and real-time streaming
- Dual Transcription Engine: VOSK for real-time + Whisper for accuracy
- WebSocket Communication: Real-time bidirectional data flow
- Vector Search: Semantic search across meeting content
- Pronunciation Training: Interactive pronunciation coaching
- REST API: Comprehensive backend API with FastAPI
- Modern Frontend: React with TypeScript and Vite
Modern sidebar navigation with organized sections and clean interface
Intelligent AI suggestions and responses during meetings
AI-powered conversation with context awareness
Personalized voice profile management and creation
Real-time text-to-speech testing with custom voice profiles
- Python: 3.11+ (for backend)
- Node.js: 18+ (for frontend)
- FFmpeg: For audio processing
- Ollama: For local LLM support (optional)
- macOS (tested)
- Linux (Docker recommended)
- Windows (Docker recommended)
# Clone the repository
git clone <repository-url>
cd MindSync2.0
# Make scripts executable
chmod +x start.sh stop.sh dev.sh
# Start everything in background
./start.sh# Using Docker Compose
docker-compose up -d
# Check status
docker-compose ps# Backend setup
cd meeting-summarizer-app/backend
python -m venv venv311
source venv311/bin/activate # On Windows: venv311\Scripts\activate
pip install -r requirements.txt
# Frontend setup
cd ../frontend
npm install
# Run backend (Terminal 1)
cd meeting-summarizer-app/backend
python run_server.py
# Run frontend (Terminal 2)
cd meeting-summarizer-app/frontend
npm run dev- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Click "Start Recording" on the main interface
- Speak into your microphone
- See live transcription appear in real-time
- AI suggestions will appear automatically
- Create: Upload audio files or start live recording
- View: Browse all meetings with search and filters
- Edit: Update meeting details and transcriptions
- Delete: Remove meetings and associated data
- Navigate to TTS section
- Upload reference audio (your voice)
- Enter text to synthesize
- Generate personalized speech
- Ask questions about meeting content
- Get AI-powered insights and summaries
- Interactive conversation with context awareness
# Start services in background
./start.sh
# Stop all services
./stop.sh
# Development toolkit
./dev.sh status # Check service status
./dev.sh logs # View real-time logs
./dev.sh test # Test API endpoints
./dev.sh clean # Clean up logs and PIDs# Start with Docker
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
# Rebuild containers
docker-compose up --build -d# Install as system service (Linux/macOS)
sudo cp mindsync.service /etc/systemd/system/
sudo systemctl enable mindsync
sudo systemctl start mindsync
# Check service status
sudo systemctl status mindsyncMindSync2.0/
βββ README.md # This file
βββ DEPLOYMENT.md # Detailed deployment guide
βββ docker-compose.yml # Docker orchestration
βββ Dockerfile # Container definition
βββ start.sh # Main startup script
βββ stop.sh # Shutdown script
βββ dev.sh # Development toolkit
βββ mindsync.service # Systemd service
βββ meeting-summarizer-app/
β βββ backend/ # FastAPI backend
β β βββ app/
β β β βββ main.py # FastAPI application
β β β βββ models/ # Database models
β β β βββ routers/ # API endpoints
β β β βββ services/ # Business logic
β β β βββ utils/ # Utility functions
β β βββ config.py # Configuration
β β βββ requirements.txt # Python dependencies
β β βββ run_server.py # Server entry point
β βββ frontend/ # React frontend
β β βββ src/
β β β βββ App.tsx # Main application
β β β βββ components/ # React components
β β β βββ assets/ # Static assets
β β βββ package.json # Node dependencies
β β βββ vite.config.ts # Vite configuration
β βββ vosk-model/ # Speech recognition model
βββ uploads/ # User uploaded files
GET /docs- API documentationPOST /upload-audio- Upload audio for transcriptionGET /meetings- List all meetingsPOST /meetings- Create new meetingWebSocket /ws/real-time-transcribe- Real-time transcription
POST /chat- AI chat interfacePOST /tts/synthesize- Text-to-speech synthesisPOST /pronunciation/score- Pronunciation scoringGET /audio/{filename}- Serve audio files
# Backend Configuration
BACKEND_HOST=0.0.0.0
BACKEND_PORT=8000
DATABASE_URL=sqlite:///uploads/meetings.db
# Frontend Configuration
FRONTEND_HOST=localhost
FRONTEND_PORT=3000
# AI Configuration
OLLAMA_BASE_URL=http://localhost:11434
LLM_MODEL=llama3.2:latest- Sample Rate: 16kHz (VOSK), 16kHz (Whisper)
- Channels: Mono
- Format: PCM, WAV, MP3, WebM supported
- Chunk Size: 1024 bytes for real-time processing
# Check port availability
./dev.sh test
# View detailed logs
./dev.sh logs
# Clean up and restart
./dev.sh clean
./start.sh- Ensure microphone permissions are granted
- Check browser audio settings
- Verify FFmpeg installation:
ffmpeg -version
# Rebuild containers
docker-compose down
docker-compose up --build -d
# Check container logs
docker-compose logs backend
docker-compose logs frontend- Check firewall settings
- Verify backend is running on port 8000
- Test with:
curl http://localhost:8000/docs
- Use Docker for consistent performance
- Ensure adequate RAM (4GB+ recommended)
- SSD storage recommended for large audio files
- Close unnecessary browser tabs during recording
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make changes and test thoroughly
- Run the test suite:
./dev.sh test - Submit a pull request
- Python: Follow PEP 8, use type hints
- TypeScript: Use strict mode, proper typing
- Git: Conventional commit messages
- Testing: Maintain test coverage > 80%
This project is licensed under the MIT License - see the LICENSE file for details.
- VOSK: Open-source speech recognition
- OpenAI Whisper: Advanced transcription accuracy
- FastAPI: Modern Python web framework
- React: Frontend user interface
- TTS (Text-to-Speech): Voice synthesis capabilities
- Deployment Guide: See
DEPLOYMENT.md - API Reference: http://localhost:8000/docs
- Development Tools: Use
./dev.shfor common tasks
- Check the troubleshooting section above
- Review logs with
./dev.sh logs - Test connectivity with
./dev.sh test - Create an issue with detailed error information
Built with β€οΈ for seamless meeting experiences