A local, real-time speech-to-text utility for Linux using Whisper.
Handsfree is a utility that provides fast, local speech-to-text transcription for Linux. Transcription is performed entirely offline on your machine via the efficient faster-whisper library based on OpenAI's Whisper model. It's controlled via a simple command-line interface (handsfreectl) and is designed primarily for dictation, outputting the transcribed text either as simulated keyboard input or directly to the system clipboard. This makes it particularly suitable for Linux desktop users who need a flexible hands-free input method.
Current Status: Handsfree is in active development and used daily by the maintainer.
Key Features:
- Local & Private: All audio processing and transcription happens on your machine.
- High-Quality Transcriptions: Leverages the
faster-whisperlibrary based on OpenAI's Whisper model for accurate results. - Flexible Control: Simple CLI (
handsfreectl) that allows starting/stopping transcription, making it easy to integrate with various triggers like keyboard shortcuts, scripts or even foot pedals. - Configurable Output: Transcribed text can be output as simulated keyboard input or copied to the clipboard using external tools configurable via config.toml.
- Voice Activity Detection (VAD): Optional VAD using the enterprise-grade Silero model allows automatic start/stop based on speech presence.
- Configurable: Behavior tuned via a simple TOML configuration file.
Handsfree consists of two main components:
handsfreed: The Python daemon responsible for audio processing and speech-to-text transcription.handsfreectl: The Rust command-line interface for controlling thehandsfreeddaemon.
Handsfree aims to:
- Fill a gap in easy-to-use real-time dictation utilities specifically for Linux desktop environments.
- Provide a robust, entirely offline speech-to-text solution, keeping your data private.
- Offer a flexible and customizable utility that is tailored to your workflow. You control how dictation is triggered (e.g., mapping
handsfreectlcommands to window manager keybindings) and how the daemon is managed (e.g., using the provided systemd user service).
1. Install System Dependencies
First, you need to install the PortAudio library, which is required by the handsfreed daemon for audio processing.
- Debian/Ubuntu:
sudo apt-get install portaudio19-dev
- Fedora/CentOS/RHEL:
sudo dnf install portaudio-devel
- Arch Linux:
sudo pacman -S portaudio
2. Install Handsfree
Install the two main components, handsfreed (the daemon) and handsfreectl (the controller), from their respective package managers.
- Install
handsfreedfrom PyPI:pip install handsfreed
- Install
handsfreectl:- From Crates.io:
cargo install handsfreectl
- You can also download pre-compiled binaries from the handsfreectl releases page.
- From Crates.io:
3. Configure and Run
After installation, you need to:
- Create a configuration file as described in the Configuration section.
- Run the daemon as described in the Usage section.
Prerequisites:
- Nix package manager installed.
- Flakes support enabled (add
experimental-features = nix-command flakesto your Nix configuration if needed). - Home Manager (optional but recommended for managing the service and configuration).
Steps:
-
Add Handsfree Flake Input: Add this repository as an input to your system or home-manager flake configuration:
# Example: flake.nix inputs section inputs = { # ... other inputs like nixpkgs, home-manager ... handsfree.url = "github:achyudh/handsfree"; # Ensure nixpkgs versions match if needed # handsfree.inputs.nixpkgs.follows = "nixpkgs"; # handsfree.inputs.home-manager.follows = "home-manager"; };
-
Configure the Home Manager Service: Import the module and configure the service in your
home-managerconfiguration (home.nixor similar):# Example: home.nix { inputs, pkgs, config, ... }: { # Import the handsfree home-manager module and setup the overlay imports = [ inputs.handsfree.homeManagerModules.default ]; nixpkgs.overlays = [ inputs.handsfree.overlay ]; # Enable and configure the daemon service services.handsfree = { enable = true; # The module automatically configures and manages the # handsfreed systemd user service. # Check the example config.toml below for more settings settings = { whisper = { model = "base.en"; # Choose desired model device = "cpu"; # Or "cuda" if applicable compute_type = "int8"; # Or "auto", "float16" etc. }; vad = { enabled = true; # Enable VAD segmentation min_silence_duration_ms = 1024; # Adjust silence timing pre_roll_duration_ms = 256; }; output = { # Example for Wayland (using wtype/wl-copy) keyboard_command = "wtype -"; clipboard_command = "wl-copy"; # Example for X11 (using xdotool/xclip) # keyboard_command = "xdotool type --clearmodifiers --file -"; # clipboard_command = "xclip -selection clipboard -in"; }; }; }; # Alternatively, you can only install the packages instead of the service home.packages = [ pkgs.handsfreectl pkgs.handsfreed ]; }
-
Apply Configuration: Run your NixOS or home-manager rebuild/switch command.
Call for Contributions: Packaging for other distributions and other package managers is welcome! Please open an issue if you'd like to help make Handsfree more accessible.
Handsfree uses a configuration file located at ~/.config/handsfree/config.toml. If you are using the Nix home-manager module, the settings you provide there will generate this file automatically. If running manually, you need to create this file.
# Example configuration for handsfreed daemon
[audio]
# Input gain multiplier (1.0 = no gain).
# input_gain = 1.5
# Enable DC offset correction for raw audio.
dc_offset_correction = true
# Window size for DC offset calculation (ms).
dc_offset_window_ms = 512
[whisper]
# Whisper model identifier (e.g., small.en, medium.en).
model = "small.en"
# Device for inference (auto, cpu, cuda).
device = "auto"
# Compute type for inference (auto, float32, float16, int8).
compute_type = "auto"
# Optional language code (empty for auto-detect).
language = "en"
# Beam size for search (1-10, higher is slower but more accurate).
beam_size = 3
# Number of CPU threads for inference (0 = auto).
cpu_threads = 0
[vad]
# Enable Voice Activity Detection.
enabled = false
# Speech probability threshold (0.0-1.0).
threshold = 0.5
# Minimum duration for a speech segment (ms).
min_speech_duration_ms = 256
# Minimum duration of silence to end a speech segment (ms).
min_silence_duration_ms = 1024
# Pre-roll duration to include before a detected speech segment (ms).
pre_roll_duration_ms = 192
# Negative threshold for speech detection (0.0-1.0, optional).
neg_threshold = 0.35
# Maximum duration of a single speech segment in seconds (0 = unlimited).
max_speech_duration_s = 0.0
# Maximum duration in seconds before listening stops (0 = disabled)
auto_disable_duration_s = 5.0
[output]
# Command to execute for keyboard output.
keyboard_command = "wtype -"
# Command to execute for clipboard output.
clipboard_command = "wl-copy"
[daemon]
# Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL).
log_level = "INFO"
# Optional custom log file path
# Default: ~/.local/state/handsfree/handsfreed.log
# log_file = "/var/log/handsfreed.log"
# Optional custom socket path
# Default: $XDG_RUNTIME_DIR/handsfree/daemon.sock or /tmp/handsfree-$USER.sock
# socket_path = "/var/run/handsfree/daemon.sock"
# Duration of audio processing chunks in seconds.
# This is only used if VAD is disabled
# time_chunk_s = 5.0-
Manual / Development: Open a terminal, navigate to the
handsfreedsource directory (and activate the venv if used), then run:python -m handsfreed
Stop the daemon with
Ctrl+C. Check stdout or the log file specified in the config (if configured) for details. -
PyPI Installation: If you installed via
pip, thehandsfreedcommand should be in your PATH.handsfreed
You can also set up a systemd user service manually if desired.
-
Nix / Home Manager: The
handsfreeddaemon runs as a systemd user service. It starts automatically on login. You can manage it using:- Check status:
systemctl --user status handsfree.service - Start/Stop/Restart:
systemctl --user start|stop|restart handsfree.service - View logs:
journalctl --user -u handsfree.service -f
- Check status:
The handsfreectl command communicates with the running handsfreed daemon. An easy way to get started is to bind keys in your window manager or hotkey daemon to execute handsfreectl toggle.
-
Start Transcription: Explicitly tells the daemon to start listening.
handsfreectl start --output keyboard # Default handsfreectl start --output clipboard -
Stop Transcription: Explicitly tells the daemon to stop the current listening session.
handsfreectl stop
-
Toggle Transcription: Toggles the transcription state. If
Idle, it starts listening. IfListening, it stops. This is ideal for binding to a single hotkey.handsfreectl toggle handsfreectl toggle --output clipboard
-
Check Status: Queries the daemon's current state once.
handsfreectl status
Possible outputs include
Idle,Listening,Processing,Error, orInactive. -
Watch Status: Streams status updates in real-time. Efficient for status bars (like Waybar or Polybar) as it avoids polling.
handsfreectl watch
-
Shutdown Daemon: Tells the
handsfreedprocess to shut down cleanly.handsfreectl shutdown
handsfreectl statusshowsInactive: Thehandsfreeddaemon is not running orhandsfreectlcannot find the communication socket (daemon.sock). Check the service status (systemctl --user status handsfree.service) and daemon logs (journalctlor the log file). Ensure socket paths match if configured manually.handsfreectlshows Connection Error / Communication Error: Daemon might have crashed, or there might be permission issues with the socket file. Check daemon logs.- No transcription output: Check
config.toml[output]commands are correct for your system (Wayland/X11) and the required tools (wtype,xdotool,wl-copy,xclip) are installed and in your$PATH. Check daemon logs for transcription or output errors. - VAD doesn't trigger / triggers too often / cuts off speech: Adjust parameters in the
[vad]section ofconfig.toml, particularlythreshold,neg_threshold,min_silence_duration_ms, andpre_roll_duration_ms. Check daemon logs for VAD state transitions (enableDEBUGlevel).
This project is licensed under the GNU General Public License v3.0.
Handsfree would not exist without several fantastic open-source projects:
- SYSTRAN/faster-whisper by for providing blazing fast Whisper implementation.
- snakers4/silero-vad for a fast, enterprise-grade VAD implementation.
- KoljaB/RealtimeSTT whose source code provided valuable insights and inspiration for VAD and real-time processing approaches.