Skip to content

ars-ppi/Papercast

Repository files navigation

PaperCast

Voice-controlled academic paper reader - fetches from arXiv, summarizes with AI, reads aloud with TTS.

Features

  • arXiv Integration: Fetches papers based on your topics
  • AI Summarization: Multiple summary levels (brief, standard, detailed, technical)
  • Voice Control: Hands-free navigation with Vosk
  • Text-to-Speech: Natural voice output with VibeVoice
  • SQLite Storage: Persistent queue, history, and saved papers

Quick Start

1. Installation

# Clone the repository
git clone <repository-url>
cd Papercast

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# Windows:
.venv\Scripts\activate
# Linux/Mac:
source .venv/bin/activate

# Install with dev dependencies
pip install -e ".[dev]"

2. Install AI Backends (Optional)

PaperCast supports real AI backends for summarization and TTS. These are optional - you can use mock backends for testing.

GitHub Copilot (for AI Summarization)

Requires a GitHub Copilot subscription:

  1. Install Copilot CLI: Follow the official guide
  2. Verify installation: Run copilot --version to confirm it's in your PATH
  3. The github-copilot-sdk Python package is included in PaperCast dependencies

VibeVoice (for Text-to-Speech)

Requires manual installation from source (GPU recommended):

# Clone VibeVoice repository
git clone https://github.com/microsoft/VibeVoice.git

# Install with TTS dependencies
pip install -e "./VibeVoice[tts]"

# Note: Flash Attention 2 is NOT supported on Windows.
# On Linux with CUDA, you can optionally install flash-attention for better performance:
# pip install flash-attn --no-build-isolation
# On Windows, PaperCast automatically uses SDPA (Scaled Dot Product Attention) instead.

VibeVoice voice presets are in VibeVoice/demo/voices/streaming_model/. Available English voices include:

  • en-Carter_man, en-Davis_man, en-Frank_man, en-Mike_man (male)
  • en-Emma_woman, en-Grace_woman (female)

You can use short names like carter or emma - they will be automatically mapped to the full preset name.

3. Download Vosk Model (Required for Voice Recognition)

PaperCast uses Vosk for offline voice recognition. You need to download a model:

# Create models directory
mkdir -p models

# Download a small English model (~50MB)
# Option 1: Using curl
curl -L -o models/vosk-model-small-en-us-0.15.zip \
  https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip models/vosk-model-small-en-us-0.15.zip -d models/
rm models/vosk-model-small-en-us-0.15.zip

# Option 2: Using PowerShell (Windows)
Invoke-WebRequest -Uri "https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip" -OutFile "models/vosk-model-small-en-us-0.15.zip"
Expand-Archive -Path "models/vosk-model-small-en-us-0.15.zip" -DestinationPath "models/"
Remove-Item "models/vosk-model-small-en-us-0.15.zip"

For better accuracy, you can use a larger model from: https://alphacephei.com/vosk/models

4. Configuration

Copy the example config and customize:

cp .env.example .env

Required settings for first run:

# Set the Vosk model path (required)
PAPERCAST_VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15

# For testing without real AI summarization, use mock backend:
PAPERCAST_SUMMARIZER_BACKEND=mock

# For testing without real TTS, use mock backend:
PAPERCAST_VOICE_MODEL=mock

# Your research topics
PAPERCAST_TOPICS=["machine learning", "natural language processing"]

5. Run PaperCast

# Start the voice-controlled reader
papercast run

# With debug output
papercast run --debug

# Override topics from command line
papercast run --topic "transformers" --topic "large language models"

# Start in daily briefing mode
papercast run --briefing

All Commands

# Show current configuration
papercast config

# Start voice-controlled reader
papercast run

# Show version
papercast version

# Export saved papers (coming soon)
papercast export --format markdown

Configuration Options

All settings can be set via environment variables or .env file:

Variable Default Description
PAPERCAST_TOPICS ["machine learning", "nlp"] arXiv search topics (JSON array)
PAPERCAST_FETCH_DAYS 7 Days to look back for papers (1-30)
PAPERCAST_MAX_PAPERS_PER_FETCH 50 Max papers per fetch (1-200)
PAPERCAST_SUMMARIZER_BACKEND copilot copilot or mock
PAPERCAST_DEFAULT_SUMMARY_LEVEL standard brief, standard, detailed, technical
PAPERCAST_VOICE_MODEL vibevoice vibevoice or mock
PAPERCAST_SPEECH_RATE 1.0 Speech rate (0.5-2.0)
PAPERCAST_WAKE_WORD (none) Optional wake word (e.g., "hey paper")
PAPERCAST_VOSK_MODEL_PATH (none) Path to Vosk model directory
PAPERCAST_DATABASE_PATH papercast.db SQLite database location
PAPERCAST_LOG_LEVEL INFO DEBUG, INFO, WARNING, ERROR
PAPERCAST_DEBUG false Enable debug mode

Voice Commands

During playback, you can say:

Command Aliases Action
next skip, next paper Go to next paper
back previous, go back Go to previous paper
pause wait, hold on Pause playback
resume continue, go on Resume playback
stop - Stop playback
save bookmark, save this Save paper to reading list
details more details, tell me more Re-read with detailed summary
brief short, summary Re-read with brief summary
repeat again, read again Repeat current paper
faster speed up Increase speech rate
slower slow down Decrease speech rate
search [topic] find papers about... Search for papers on topic

Testing Mode (No External Dependencies)

To test PaperCast without Vosk, VibeVoice, or Copilot:

# Set all backends to mock in .env
PAPERCAST_SUMMARIZER_BACKEND=mock
PAPERCAST_VOICE_MODEL=mock

This uses mock implementations that simulate the real components.

Development

# Run tests
pytest

# Run tests with coverage
pytest --cov=papercast

# Type checking
mypy src/papercast

# Linting
ruff check .

# Auto-fix linting issues
ruff check . --fix

Project Structure

Papercast/
├── src/papercast/
│   ├── orchestrator/    # Central coordinator
│   ├── scraper/         # arXiv paper fetching
│   ├── queue/           # Paper queue management
│   ├── summarizer/      # AI summarization
│   ├── tts/             # Text-to-speech
│   ├── voice/           # Voice command recognition
│   ├── storage/         # SQLite persistence
│   ├── config.py        # Settings management
│   ├── models.py        # Data models
│   └── main.py          # CLI entry point
├── tests/               # Test suite
├── models/              # Vosk models (download separately)
├── .env                 # Your configuration
└── .env.example         # Configuration template

Troubleshooting

"Folder '' does not contain model files"

You need to download a Vosk model. See Download Vosk Model above.

"Failed to initialize Vosk"

Check that PAPERCAST_VOSK_MODEL_PATH points to the correct model directory (the folder containing am/, conf/, etc.).

No audio output

Make sure you have audio output devices configured. For testing, use PAPERCAST_VOICE_MODEL=mock.

Copilot CLI not found

The Copilot backend requires the GitHub Copilot CLI to be installed and in your PATH. Install it from: https://docs.github.com/en/copilot/how-tos/set-up/install-copilot-cli

For testing without Copilot, use PAPERCAST_SUMMARIZER_BACKEND=mock.

VibeVoice not installed

VibeVoice must be installed from source:

git clone https://github.com/microsoft/VibeVoice.git
pip install -e "./VibeVoice[tts]"

For testing without VibeVoice, use PAPERCAST_VOICE_MODEL=mock.

License

MIT

About

Voice-controlled academic paper reader - fetches from arXiv, summarizes with AI, reads aloud with TTS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages