An AI-powered system that listens to improv performers in real-time, detects location context from speech, generates appropriate background images, and automatically sends them to QLab for display. Perfect for improv theaters wanting dynamic, responsive backdrops.
- π€ Real-time speech recognition - Listens to performers via microphone
- π§ AI location detection - Identifies settings from natural speech
- π¨ Dynamic image generation - Creates backgrounds using DALL-E 3
- π Smart library system - Reuses existing environments for speed
- π΅ Ambient sound integration - Adds appropriate audio atmospheres
- π¬ QLab automation - Seamlessly integrates with theater tech setup
- β‘ Intelligent rate limiting - Optimized for live performance
- macOS (for QLab integration)
- Python 3.8+
- QLab 5 (theater lighting/sound software)
- OpenAI API key with DALL-E access
- Microphone for speech input
-
Clone the repository:
git clone https://github.com/your-username/improv-ai.git cd improv-ai -
Set up Python virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
# For macOS users (install PortAudio first): brew install portaudio # For Ubuntu/Debian users: # sudo apt-get install portaudio19-dev python3-pyaudio # For Windows users: # PyAudio wheel may be needed from: https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio pip install -r requirements.txt
PyAudio Installation Issues?
- macOS:
brew install portaudiothenpip install pyaudio - Linux:
sudo apt-get install portaudio19-dev - Windows: Download wheel from unofficial binaries
- Alternative:
pip install --global-option='build_ext' --global-option='-I/usr/local/include' --global-option='-L/usr/local/lib' pyaudio
- macOS:
-
Configure your OpenAI API key:
cp .env.example .env # Edit .env and add your OpenAI API key -
Set up QLab:
- Open QLab 5
- Enable OSC in QLab preferences
- Create a new workspace for your show
-
Add ambient sounds (optional):
python3 get_ambient_sounds.py # See guide for sound sources
-
Start the system:
python3 main.py # Full features (DALL-E 3 + ambient sounds) python3 main.py --fast # Faster generation (DALL-E 2) python3 main.py --no-sounds # Disable ambient sounds python3 main.py --auto-default 5 # Auto-default backdrop after 5min
-
Begin your improv performance! The system will:
- Listen for location mentions ("Let's go to the coffee shop")
- Generate or reuse appropriate backgrounds
- Send images to QLab automatically
- Add ambient sound cues (if enabled)
-
Tech controls:
- Press
d+ Enter to trigger default backdrop - Ctrl+C to exit gracefully
- Press
The system uses Python's SpeechRecognition library to continuously listen for speech patterns that indicate location changes:
# Examples of detected phrases:
"Let's go to the Italian restaurant" β italian_restaurant.png
"We're at the park now" β park.png
"This coffee shop is crowded" β coffee_shop.png
"Welcome to our office" β office.pngAI analyzes speech context to extract reusable environment names:
- "Fancy Italian bistro" β
italian_restaurant - "Szechuan noodle place" β
chinese_restaurant - "Dark spooky forest" β
dark_forest
- First mention: Generates new DALL-E 3 image (high quality)
- Subsequent mentions: Instantly reuses from library
- Optimized prompts: Creates intimate, theater-appropriate backgrounds
Automatically creates and triggers QLab cues via AppleScript:
- Video cues for background images
- Audio cues for ambient sounds
- Auto-stops previous backgrounds
OPENAI_API_KEY=your_api_key_here- Rate limiting: Adjust
min_intervalinmain.py(default: 15 seconds) - Image quality: Use
--fastflag for DALL-E 2 vs DALL-E 3 (default) - Ambient sounds: Use
--no-soundsflag to disable audio cues - Auto-default: Use
--auto-default Nfor backdrop after N minutes - Speech sensitivity: Modify
phrase_time_limitinspeech_recognizer.py
improv-ai/
βββ main.py # Main application orchestrator
βββ speech_recognizer.py # Real-time speech recognition
βββ image_generator.py # AI image generation & library
βββ sound_generator.py # Ambient sound system
βββ qlab_integration.py # QLab AppleScript automation
βββ get_ambient_sounds.py # Sound collection utility
βββ generated_images/ # Environment image library
βββ generated_sounds/ # Ambient audio files
βββ requirements.txt # Python dependencies
βββ README.md # This file
The system supports ambient audio for immersive environments:
-
Run the sound collection guide:
python3 get_ambient_sounds.py
-
Download sounds from:
- Freesound.org (free, Creative Commons)
- Zapsplat.com (professional quality)
- YouTube Audio Library
- AI generation tools (Suno, Udio)
-
Save as:
environment_ambient.wavingenerated_sounds/park_ambient.wavrestaurant_ambient.wavcoffee_shop_ambient.wav- etc.
- Set up QLab workspace before show
- Test speech recognition levels during sound check
- Use default backdrop feature (
d+ Enter) between scenes - Monitor rate limiting - system prevents API spam
- Speak naturally - system detects context, not commands
- Be specific about locations: "Italian restaurant" vs just "restaurant"
- Allow ~3 seconds for new environments to generate
- Library reuse is instant for repeated locations
- Internet required for new image generation
- Offline capable for library reuse
- 15-second rate limit prevents API overuse
- Smart caching balances quality with speed
- Test your setup: Run
python3 test_microphone.pybefore shows - Microphone placement: 2-6 feet from performers works best
- Use quality mics: USB microphones often outperform built-in ones
- Reduce echo: Soft furnishings help absorb sound reflections
- Consistent volume: Train performers to project consistently
- Clear enunciation: Theater projection techniques work well
"Could not understand speech"
- Run microphone test:
python3 test_microphone.py - Check microphone permissions in System Settings
- Reduce background noise (close windows, turn off fans)
- Speak louder and more clearly
- Use a better quality microphone if possible
- Adjust settings in speech_recognizer.py:
energy_threshold: Lower for quieter environmentspause_threshold: Increase if cutting off too soonphrase_time_limit: Increase for longer sentences
"QLab connection failed"
- Ensure QLab 5 is running
- Enable OSC in QLab preferences
- Check that workspace is open
"OpenAI API error"
- Verify API key in
.envfile - Check API quota/billing
- Ensure DALL-E access is enabled
Images too large/small
- Modify prompt generation in
image_generator.py - Adjust QLab video cue settings
- Check theater projector resolution
Contributions welcome! Areas for improvement:
- Additional ambient sound mappings
- Enhanced location detection
- Support for other theater software
- Multi-language speech recognition
- Custom prompt templates
MIT License - see LICENSE file for details.
Created for improv theater communities. Special thanks to:
- OpenAI for DALL-E API
- Figure 53 for QLab
- The improv community for inspiration
- QLab by Figure 53
- OpenAI DALL-E
- Freesound.org - Ambient sounds
- Python SpeechRecognition
Made with β€οΈ for the theater community