A powerful, voice-controlled AI assistant with advanced automation capabilities, built with Python and PyQt5.
- Voice Recognition - Hands-free operation with speech-to-text
- Text-to-Speech - Natural voice responses
- Smart Decision Making - AI-powered command categorization using Cohere
- GUI Interface - Modern PyQt5 chat interface
- System Commands - Volume control, shutdown, mute/unmute
- Application Control - Open/close apps, web automation
- Web Search - Google search with real-time results
- YouTube Integration - Search and play videos
- Set Reminders - "Remind me to call mom at 5:30 PM"
- Timer Functions - "Set a timer for 5 minutes"
- List Management - View active reminders and timers
- Persistent Storage - Data saved across app restarts
- Background Notifications - Voice alerts when time is up
- Text-to-Image - Generate images from descriptions
- Stable Diffusion - High-quality AI art using Hugging Face API
- Auto-Display - Generated images open automatically
- Organized Storage - Images saved in
Data/Generated_Images/
- Conversational AI - Natural language conversations
- Real-time Search - Up-to-date information retrieval
- Chat History - Persistent conversation logs
- Portable Deployment - Run anywhere with Docker
- GUI & Headless Modes - Flexible deployment options
- Cross-platform - Windows, Linux, Mac compatibility
git clone https://github.com/Developer-Tanay/Jarvis-Assistant.git
cd Jarvis-Assistentpip install -r Requirements.txtCreate a .env file in the root directory:
# Required API Keys
COHERE_API_KEY=your_cohere_api_key_here
HUGGINGFACE_API_KEY=your_huggingface_api_key_here
GROQ_API_KEY=your_groq_api_key_here
# Personal Settings
USERNAME=YourName
ASSISTANT_NAME=Jarvis
INPUT_LANGUAGE=en
ASSISTANT_VOICE=Davidpython Main.py- Visit Cohere.ai
- Sign up for free account
- Get API key from dashboard
- Add to
.envfile
- Visit Hugging Face
- Create account and get access token
- Add to
.envfile
- Visit Groq
- Sign up and get API key
- Add to
.envfile
"Jarvis, remind me to call mom at 5:30 PM"
"Set a timer for 10 minutes"
"What are my reminders?"
"Show my active timers"
"Generate an image of a sunset over mountains"
"Create a picture of a cute robot"
"Make an image showing a futuristic city"
"Mute my computer"
"Unmute"
"Shutdown computer"
"Search Google for Python tutorials"
"Play relaxing music on YouTube"
"What's the latest news about AI?"
"Hello Jarvis, how are you?"
"Tell me a joke"
"What's the weather like?"
Jarvis-Assistent1/
├── Main.py # Main application entry point
├── Requirements.txt # Python dependencies
├── .env # Environment variables (create this)
├── README.md # This file
├── Backend/ # Core AI and automation modules
│ ├── Model.py # Decision-making AI (Cohere)
│ ├── Automation.py # Command execution
│ ├── ReminderTimer.py # Reminder/timer system
│ ├── ImageGeneration.py # AI image generation
│ ├── Chatbot.py # Conversational AI
│ ├── SpeechToText.py # Voice recognition
│ ├── TextToSpeech.py # Voice synthesis
│ └── RealtimeSearchEngine.py # Web search
├── Frontend/ # User interface
│ ├── GUI.py # PyQt5 interface
│ └── Files/ # GUI data files
├── Data/ # Application data
│ ├── ChatLog.json # Chat history
│ ├── reminders.json # Active reminders
│ ├── timers.json # Active timers
│ └── Generated_Images/ # AI-generated images
└── Docker/ # Containerization
├── Dockerfile
├── docker-compose.yml
└── run-docker.sh
# Build and run
docker-compose up --build
# Headless mode (no GUI)
docker-compose --profile headless up jarvis-headless# Use provided script
run-docker.bat build
run-docker.bat headlessEdit .env file:
INPUT_LANGUAGE- Recognition language (en, es, fr, etc.)ASSISTANT_VOICE- TTS voice name
Modify Frontend/GUI.py for:
- Theme colors
- Window size
- Chat appearance
Add new commands in:
Backend/Model.py- For command recognitionBackend/Automation.py- For command execution
PyQt5- GUI frameworkpython-dotenv- Environment variablescohere- AI decision makingrequests- HTTP requestskeyboard- System control
pygame- Audio playbackedge-tts- Text-to-speechspeech_recognition- Voice input
selenium- Web automationbeautifulsoup4- Web scrapinggooglesearch-python- Search integration
Pillow- Image processingrequests- API communication
-
API Response Times
- Cohere: ~1-2 seconds
- Image Generation: ~10-30 seconds
- Voice Recognition: ~2-3 seconds
-
Resource Usage
- RAM: ~200-400MB
- CPU: Low usage except during image generation
-
Optimization
- Use Docker for consistent performance
- Close unused applications for better voice recognition
- Ensure stable internet for API calls
# Check microphone permissions
# Ensure pygame is installed
pip install pygame- Verify API keys in
.envfile - Check API key permissions
- Ensure internet connectivity
- Confirm Hugging Face API key
- Try free model:
runwayml/stable-diffusion-v1-5 - Check network connection
# Install PyQt5
pip install PyQt5- Ensure Docker Desktop is running
- Use headless mode on Windows
- Check volume mounts for data persistence
Enable detailed logging by adding to .env:
DEBUG_MODE=True- API Keys: Never commit
.envfile to version control - Local Data: All conversations stored locally in
Data/ - Network: Only communicates with specified AI APIs
- Permissions: Requires microphone access for voice features
- Multi-language support
- Custom wake word detection
- Integration with smart home devices
- Advanced image editing capabilities
- Calendar integration
- Email automation
- Mobile app companion
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
For issues and questions:
- Open GitHub issue
- Check troubleshooting section
- Review API documentation
Made with ❤️ for AI automation enthusiasts
