AI meeting notes for macOS — real-time transcript & auto-summary, Vietnamese-optimized.
- Dual-stream recording — captures your mic (Me) and system audio (Them) simultaneously
- Real-time transcript — live speech-to-text as you speak, not after you finish
- Auto-generated summaries — structured notes with key points, decisions, and action items
- Multi-speaker support — extensible speaker model (Me, Them, Speaker C, D...) with color-coded pills
- Vietnamese-first — optimized for Vietnamese and mixed Vietnamese/English meetings
- Pause / Resume — pause recording without ending the meeting
- Privacy-first — no bot joins your calls, audio is never stored
- Menu bar controls — start, stop, and monitor from the macOS menu bar
- Search — full-text search across all meeting titles and transcripts
- Auto-cleanup — empty test meetings are purged on launch
brew tap sonpiaz/tap https://github.com/sonpiaz/homebrew-tap
brew install --cask phemegit clone https://github.com/sonpiaz/pheme.git
cd pheme
brew install xcodegen # if not installed
make run- Launch Pheme
- Grant microphone permission when prompted
- Grant Screen Recording permission in System Settings → Privacy & Security
- Enter your OpenAI API key in Settings (⌘,)
- Click Record — speak, and watch the transcript appear in real-time
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Microphone │────▶│ AudioChunker │────▶│ RealtimeAPI │
│ (AVAudio) │ │ (24kHz PCM) │ │ (Me transcript) │
└─────────────┘ └──────────────┘ └────────┬────────┘
│
┌─────────────┐ ┌──────────────┐ ┌─────────▼────────┐
│ System Audio │────▶│ AudioChunker │────▶│ RealtimeAPI │
│ (CoreAudio) │ │ (24kHz PCM) │ │ (Them transcript)│
└─────────────┘ └──────────────┘ └────────┬────────┘
│
┌────────▼────────┐
│ GPT-4o-mini │
│ (Summary Gen) │
└─────────────────┘
Audio capture uses AVAudioEngine for mic and Core Audio Taps (CATapDescription) for system audio — capturing all other apps without joining your call.
Transcription streams PCM16 audio at 24kHz over WebSocket to OpenAI's Realtime Transcription API (gpt-4o-transcribe), with server-side VAD for natural turn detection.
Summaries are generated via OpenAI Chat Completions (GPT-4o) in the same language as the transcript.
- macOS 14.2+ (Sonoma)
- OpenAI API key
- Microphone permission
- Screen Recording permission (for system audio capture)
Pheme sends audio data only to OpenAI for transcription. No audio is stored locally or sent anywhere else. API keys are stored in UserDefaults on your Mac. Meeting transcripts and summaries are stored locally via SwiftData.
make generate # Generate Xcode project
make build # Build via xcodebuild
make run # Build and run
make release # Build release DMG
make clean # Clean build artifactsSources/Pheme/
├── App/
│ ├── PhemeApp.swift — App entry, menu bar, onboarding
│ └── AppState.swift — Shared state, recent meetings, cleanup
├── Audio/
│ ├── MicRecorder.swift — 24kHz mono mic capture via AVAudioEngine
│ ├── SystemAudioRecorder.swift — System audio via Core Audio Taps
│ ├── AudioChunker.swift — Float32 → PCM16LE → base64 chunks
│ └── DualStreamMixer.swift — Routes mic + system to separate chunkers
├── Transcription/
│ ├── RealtimeTranscriber.swift — WebSocket client for OpenAI Realtime API
│ └── TranscriptionSession.swift — Orchestrates dual-stream transcription
├── Summary/
│ ├── SummaryGenerator.swift — GPT-4o title + summary generation
│ └── SummaryPrompts.swift — Bilingual prompt templates
├── Storage/
│ ├── Meeting.swift — SwiftData model with formatted transcript
│ └── TranscriptSegment.swift — Speaker enum (Me, Them, multi-speaker)
├── UI/
│ ├── MainContentView.swift — Split view: list + detail + transcript
│ ├── MeetingListView.swift — Sidebar with search and date grouping
│ ├── LiveTranscriptView.swift — Real-time scrolling transcript
│ ├── RecordingControlView.swift — Record/pause/stop buttons
│ ├── MenuBarView.swift — Menu bar controls
│ ├── SettingsView.swift — API key, preferences
│ └── OnboardingView.swift — First-launch permission wizard
└── System/
├── SoundFeedback.swift — Start/stop audio cues
├── LaunchAtLogin.swift — Auto-start at login
├── PermissionManager.swift — Permission checks
└── CustomDictionary.swift — User-defined terms for transcription
| Technology | Purpose |
|---|---|
| Swift 5.9 | Language |
| SwiftUI + SwiftData | UI framework + persistence |
| AVFoundation | Microphone audio capture |
| Core Audio | System audio capture (CATapDescription) |
| OpenAI Realtime API | Live transcription via WebSocket |
| OpenAI Chat Completions | Summary generation (GPT-4o) |
| XcodeGen | Project generation |
Contributions are welcome! See CONTRIBUTING.md for guidelines.
- Kapt — macOS screenshot tool with annotation & OCR
- Yap — Push-to-talk dictation for Mac
- hidrix-tools — MCP server for web & social search
MIT — see LICENSE for details.
Named after Pheme (Φήμη) — the Greek goddess of fame, rumor, and voice. She heard everything and spread the word.

