Skip to content

frameforge: add PortAudio live audio capture with VAD support#9

Merged
TheOriginalBytePlayer merged 16 commits intomasterfrom
copilot/extend-audio-collection-capabilities
Jan 2, 2026
Merged

frameforge: add PortAudio live audio capture with VAD support#9
TheOriginalBytePlayer merged 16 commits intomasterfrom
copilot/extend-audio-collection-capabilities

Conversation

Copy link
Copy Markdown

Copilot AI commented Jan 1, 2026

  • Explore repository structure and understand FrameForge SideCar application
  • Identify current audio input mechanism (IPC pipe and file-based)
  • Review build system (CMake)
  • Add PortAudio dependency to CMakeLists.txt
  • Create audio capture module using PortAudio
  • Integrate audio capture into main loop
  • Add command-line option for audio capture mode
  • Test compilation with and without PortAudio
  • Update documentation with dependency installation and usage instructions
  • Create test to verify audio capture functionality
  • Run code review
  • Address all review feedback (Pa_Terminate, signal handling, duplicate help line)
  • Run security check with CodeQL (no issues found)
  • Verify final build and test execution
  • Implement Voice Activity Detection (VAD)
  • Change audio processing to trigger after >=500ms speech + >=250ms silence
  • Fix race conditions in VAD state management
  • Address all code review feedback:
    • Fixed null pointer dereference for device_info
    • Made global PortAudio initialization thread-safe with mutex
    • Fixed reference counting (increment on init, decrement on destroy)
    • Prevent multiple initialize() calls with initialized_ flag
    • Fixed signal handler to be async-signal-safe (removed fprintf)
    • Moved signal handler setup to main() start - removed all duplicates
    • Fixed stream pointer handling on errors
    • Updated documentation to require both PortAudio AND Whisper
    • Removed duplicate CommandValidator initialization
    • Fixed CMake variable scope for tests
    • Added error handling for Pa_StopStream
    • Added note about set_callback thread safety
    • Added note about callback data copy performance
    • Enforced both PortAudio and Whisper requirement at compile time for live audio mode
    • Added maximum buffer size check (30s default) to prevent unbounded memory growth
    • Fixed callback signature to match PaStreamCallback exactly (no reinterpret_cast)

Summary

Successfully addressed ALL code review feedback:

  1. Multiple initialize() calls: Added initialized_ flag to prevent re-initialization
  2. Thread Safety: Added mutex protection for global PortAudio state
  3. Resource Management: Proper reference counting with decrement in destructor
  4. Error Handling: Decrement ref count on all error paths
  5. Stream Pointer Safety: Use local variable before assigning to stream_
  6. Callback Performance: Added note about data copy in callback (API constraint)
  7. Compile-time Dependency Check: Live audio mode now requires both PortAudio AND Whisper at compile time
  8. Documentation: Updated to mention both PortAudio and Whisper requirements
  9. Signal Handler Deduplication: Moved to main() start, removed all duplicates
  10. Buffer Size Limit: Added configurable max_buffer_duration_s (30s default) with rolling window
  11. Callback Signature: Fixed to match PaStreamCallback exactly (no undefined behavior)
Original prompt

Extend the FrameForgeSIdeCar applicaiton to directly collect the audio data via portaudio rather than have it sent to it.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 4 commits January 1, 2026 23:13
Co-authored-by: TheOriginalBytePlayer <18058224+TheOriginalBytePlayer@users.noreply.github.com>
Co-authored-by: TheOriginalBytePlayer <18058224+TheOriginalBytePlayer@users.noreply.github.com>
Co-authored-by: TheOriginalBytePlayer <18058224+TheOriginalBytePlayer@users.noreply.github.com>
…ling, remove duplicate help line

Co-authored-by: TheOriginalBytePlayer <18058224+TheOriginalBytePlayer@users.noreply.github.com>
Copilot AI changed the title [WIP] Extend FrameForgeSideCar to directly collect audio data frameforge: add PortAudio live audio capture support Jan 1, 2026
@TheOriginalBytePlayer
Copy link
Copy Markdown
Owner

TheOriginalBytePlayer commented Jan 1, 2026

@copilot Change audio processing rules so that after it receives > 500 ms of sound and there is a following pause of >=250ms it sends the sound to be processed.

@TheOriginalBytePlayer TheOriginalBytePlayer marked this pull request as ready for review January 1, 2026 23:47
Copilot AI review requested due to automatic review settings January 1, 2026 23:47
Copy link
Copy Markdown
Owner

@TheOriginalBytePlayer TheOriginalBytePlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the audio processing so that after it receives >=500ms of sound and there is a following silences >=250ms it sends that audio to whisper for processing.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds live audio capture functionality to the FrameForge sidecar application, enabling direct microphone input via PortAudio instead of relying solely on file or IPC-based audio input. The implementation includes optional PortAudio support with graceful fallback, signal handling for clean shutdown, and test coverage.

Key changes:

  • Audio capture abstraction layer with PortAudio implementation and stub fallback for builds without PortAudio
  • Live audio capture mode with continuous 2-second chunk processing through Whisper→Llama→validation pipeline
  • Signal handling (SIGINT/SIGTERM) for graceful shutdown in both live and server modes

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
tools/frameforge/frameforge-audio.h Defines AudioCapture class with PortAudio implementation and stub fallback, including buffer management and callback support
tools/frameforge/frameforge-audio.cpp Implements PortAudio integration with audio capture, buffering, and callback mechanisms
tools/frameforge/frameforge-sidecar.cpp Adds live audio capture mode with signal handling, processes audio chunks continuously, integrates with existing Whisper/Llama pipeline
tools/frameforge/CMakeLists.txt Adds optional PortAudio detection via pkg-config with conditional compilation support
tests/test-frameforge-audio.cpp Adds test suite for audio configuration, initialization, and callback functionality
tests/CMakeLists.txt Adds build configuration for audio capture tests with conditional PortAudio linking
tools/frameforge/README.md Documents PortAudio as optional dependency, adds live audio capture mode usage examples and installation instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/frameforge/frameforge-audio.cpp
Comment thread tools/frameforge/frameforge-audio.cpp Outdated
Comment thread tools/frameforge/frameforge-audio.cpp
Comment thread tools/frameforge/frameforge-audio.cpp
Comment thread tests/CMakeLists.txt
Comment thread tools/frameforge/frameforge-audio.cpp Outdated
Comment thread tools/frameforge/frameforge-audio.cpp Outdated
Comment thread tools/frameforge/frameforge-sidecar.cpp Outdated
Comment thread tests/test-frameforge-audio.cpp
Comment thread tools/frameforge/frameforge-sidecar.cpp Outdated
…0ms silence

Co-authored-by: TheOriginalBytePlayer <18058224+TheOriginalBytePlayer@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Jan 1, 2026

Change the audio processing so that after it receives >=500ms of sound and there is a following silences >=250ms it sends that audio to whisper for processing.

Implemented in commit 910ac39. Audio processing now triggers automatically after detecting >=500ms of sound followed by >=250ms of silence. Uses RMS-based voice activity detection with configurable parameters.

@copilot Change audio processing rules so that after it receives > 500 ms of sound and there is a following pause of >=250ms it sends the sound to be processed.

Implemented VAD-based audio processing in commit 910ac39. The system now:

  • Detects when >=500ms of speech is received
  • Waits for >=250ms of silence after the speech
  • Then sends the audio to Whisper for processing

The VAD uses RMS (root mean square) to distinguish speech from silence, with configurable thresholds in AudioConfig.

Co-authored-by: TheOriginalBytePlayer <18058224+TheOriginalBytePlayer@users.noreply.github.com>
Copilot AI changed the title frameforge: add PortAudio live audio capture support frameforge: add PortAudio live audio capture with VAD support Jan 1, 2026
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
TheOriginalBytePlayer and others added 2 commits January 1, 2026 16:16
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ization

Co-authored-by: TheOriginalBytePlayer <18058224+TheOriginalBytePlayer@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Jan 2, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

@TheOriginalBytePlayer
Copy link
Copy Markdown
Owner

@copilot retry

Copy link
Copy Markdown
Author

Copilot AI commented Jan 2, 2026

@copilot retry

All work has been successfully completed and verified:

✅ Build succeeds with PortAudio support enabled
✅ All tests pass (test-frameforge-audio)
✅ Application help displays correctly with --live-audio option
✅ All code review feedback has been addressed in commits d3a5a0d, b2730e5, and earlier

The PR is ready for review. All issues mentioned in the code review have been fixed:

  • Thread-safe PortAudio initialization with mutex
  • Proper reference counting and Pa_Terminate() cleanup
  • Multiple initialize() calls prevented
  • Stream pointer safety improved
  • CMake variables properly cached for cross-subdirectory access
  • Documentation updated to require both PortAudio and Whisper

TheOriginalBytePlayer and others added 2 commits January 1, 2026 17:36
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: TheOriginalBytePlayer <18058224+TheOriginalBytePlayer@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Jan 2, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

…add buffer size limit, fix callback signature

Co-authored-by: TheOriginalBytePlayer <18058224+TheOriginalBytePlayer@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Jan 2, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

@TheOriginalBytePlayer TheOriginalBytePlayer merged commit 1d0baf6 into master Jan 2, 2026
37 of 70 checks passed
@TheOriginalBytePlayer TheOriginalBytePlayer deleted the copilot/extend-audio-collection-capabilities branch January 2, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants