Skip to content

achyudh/handsfree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Handsfree

Crates.io PyPI License

A local, real-time speech-to-text utility for Linux using Whisper.

Overview

Handsfree is a utility that provides fast, local speech-to-text transcription for Linux. Transcription is performed entirely offline on your machine via the efficient faster-whisper library based on OpenAI's Whisper model. It's controlled via a simple command-line interface (handsfreectl) and is designed primarily for dictation, outputting the transcribed text either as simulated keyboard input or directly to the system clipboard. This makes it particularly suitable for Linux desktop users who need a flexible hands-free input method.

Current Status: Handsfree is in active development and used daily by the maintainer.

Key Features:

  • Local & Private: All audio processing and transcription happens on your machine.
  • High-Quality Transcriptions: Leverages the faster-whisper library based on OpenAI's Whisper model for accurate results.
  • Flexible Control: Simple CLI (handsfreectl) that allows starting/stopping transcription, making it easy to integrate with various triggers like keyboard shortcuts, scripts or even foot pedals.
  • Configurable Output: Transcribed text can be output as simulated keyboard input or copied to the clipboard using external tools configurable via config.toml.
  • Voice Activity Detection (VAD): Optional VAD using the enterprise-grade Silero model allows automatic start/stop based on speech presence.
  • Configurable: Behavior tuned via a simple TOML configuration file.

Handsfree consists of two main components:

Motivation

Handsfree aims to:

  • Fill a gap in easy-to-use real-time dictation utilities specifically for Linux desktop environments.
  • Provide a robust, entirely offline speech-to-text solution, keeping your data private.
  • Offer a flexible and customizable utility that is tailored to your workflow. You control how dictation is triggered (e.g., mapping handsfreectl commands to window manager keybindings) and how the daemon is managed (e.g., using the provided systemd user service).

Installation

Manual Installation

1. Install System Dependencies

First, you need to install the PortAudio library, which is required by the handsfreed daemon for audio processing.

  • Debian/Ubuntu:
    sudo apt-get install portaudio19-dev
  • Fedora/CentOS/RHEL:
    sudo dnf install portaudio-devel
  • Arch Linux:
    sudo pacman -S portaudio

2. Install Handsfree

Install the two main components, handsfreed (the daemon) and handsfreectl (the controller), from their respective package managers.

  • Install handsfreed from PyPI:
    pip install handsfreed
  • Install handsfreectl:

3. Configure and Run

After installation, you need to:

  1. Create a configuration file as described in the Configuration section.
  2. Run the daemon as described in the Usage section.

Nix Flake

Prerequisites:

  • Nix package manager installed.
  • Flakes support enabled (add experimental-features = nix-command flakes to your Nix configuration if needed).
  • Home Manager (optional but recommended for managing the service and configuration).

Steps:

  1. Add Handsfree Flake Input: Add this repository as an input to your system or home-manager flake configuration:

    # Example: flake.nix inputs section
    inputs = {
      # ... other inputs like nixpkgs, home-manager ...
      handsfree.url = "github:achyudh/handsfree";
      # Ensure nixpkgs versions match if needed
      # handsfree.inputs.nixpkgs.follows = "nixpkgs";
      # handsfree.inputs.home-manager.follows = "home-manager";
    };
  2. Configure the Home Manager Service: Import the module and configure the service in your home-manager configuration (home.nix or similar):

    # Example: home.nix
    { inputs, pkgs, config, ... }: {
    
      # Import the handsfree home-manager module and setup the overlay
      imports = [ inputs.handsfree.homeManagerModules.default ];
      nixpkgs.overlays = [ inputs.handsfree.overlay ];
    
      # Enable and configure the daemon service
      services.handsfree = {
        enable = true;
        # The module automatically configures and manages the
        # handsfreed systemd user service.
    
        # Check the example config.toml below for more settings
        settings = {
          whisper = {
            model = "base.en"; # Choose desired model
            device = "cpu"; # Or "cuda" if applicable
            compute_type = "int8"; # Or "auto", "float16" etc.
          };
          vad = {
            enabled = true; # Enable VAD segmentation
            min_silence_duration_ms = 1024; # Adjust silence timing
            pre_roll_duration_ms = 256;
          };
          output = {
            # Example for Wayland (using wtype/wl-copy)
            keyboard_command = "wtype -";
            clipboard_command = "wl-copy";
    
            # Example for X11 (using xdotool/xclip)
            # keyboard_command = "xdotool type --clearmodifiers --file -";
            # clipboard_command = "xclip -selection clipboard -in";
          };
        };
      };
    
      # Alternatively, you can only install the packages instead of the service
      home.packages = [ pkgs.handsfreectl pkgs.handsfreed ];
    }
  3. Apply Configuration: Run your NixOS or home-manager rebuild/switch command.

Call for Contributions: Packaging for other distributions and other package managers is welcome! Please open an issue if you'd like to help make Handsfree more accessible.

Configuration

Handsfree uses a configuration file located at ~/.config/handsfree/config.toml. If you are using the Nix home-manager module, the settings you provide there will generate this file automatically. If running manually, you need to create this file.

# Example configuration for handsfreed daemon

[audio]
# Input gain multiplier (1.0 = no gain).
# input_gain = 1.5

# Enable DC offset correction for raw audio.
dc_offset_correction = true

# Window size for DC offset calculation (ms).
dc_offset_window_ms = 512

[whisper]
# Whisper model identifier (e.g., small.en, medium.en).
model = "small.en"

# Device for inference (auto, cpu, cuda).
device = "auto"

# Compute type for inference (auto, float32, float16, int8).
compute_type = "auto"

# Optional language code (empty for auto-detect).
language = "en"

# Beam size for search (1-10, higher is slower but more accurate).
beam_size = 3

# Number of CPU threads for inference (0 = auto).
cpu_threads = 0

[vad]
# Enable Voice Activity Detection.
enabled = false

# Speech probability threshold (0.0-1.0).
threshold = 0.5

# Minimum duration for a speech segment (ms).
min_speech_duration_ms = 256

# Minimum duration of silence to end a speech segment (ms).
min_silence_duration_ms = 1024

# Pre-roll duration to include before a detected speech segment (ms).
pre_roll_duration_ms = 192

# Negative threshold for speech detection (0.0-1.0, optional).
neg_threshold = 0.35

# Maximum duration of a single speech segment in seconds (0 = unlimited).
max_speech_duration_s = 0.0

# Maximum duration in seconds before listening stops (0 = disabled)
auto_disable_duration_s = 5.0

[output]
# Command to execute for keyboard output.
keyboard_command = "wtype -"

# Command to execute for clipboard output.
clipboard_command = "wl-copy"

[daemon]
# Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL).
log_level = "INFO"

# Optional custom log file path
# Default: ~/.local/state/handsfree/handsfreed.log
# log_file = "/var/log/handsfreed.log"

# Optional custom socket path
# Default: $XDG_RUNTIME_DIR/handsfree/daemon.sock or /tmp/handsfree-$USER.sock
# socket_path = "/var/run/handsfree/daemon.sock"

# Duration of audio processing chunks in seconds.
# This is only used if VAD is disabled
# time_chunk_s = 5.0

Usage

Daemon Management

  • Manual / Development: Open a terminal, navigate to the handsfreed source directory (and activate the venv if used), then run:

    python -m handsfreed

    Stop the daemon with Ctrl+C. Check stdout or the log file specified in the config (if configured) for details.

  • PyPI Installation: If you installed via pip, the handsfreed command should be in your PATH.

    handsfreed

    You can also set up a systemd user service manually if desired.

  • Nix / Home Manager: The handsfreed daemon runs as a systemd user service. It starts automatically on login. You can manage it using:

    • Check status: systemctl --user status handsfree.service
    • Start/Stop/Restart: systemctl --user start|stop|restart handsfree.service
    • View logs: journalctl --user -u handsfree.service -f

CLI Usage

The handsfreectl command communicates with the running handsfreed daemon. An easy way to get started is to bind keys in your window manager or hotkey daemon to execute handsfreectl toggle.

  • Start Transcription: Explicitly tells the daemon to start listening.

    handsfreectl start --output keyboard  # Default
    handsfreectl start --output clipboard
  • Stop Transcription: Explicitly tells the daemon to stop the current listening session.

    handsfreectl stop
  • Toggle Transcription: Toggles the transcription state. If Idle, it starts listening. If Listening, it stops. This is ideal for binding to a single hotkey.

    handsfreectl toggle
    handsfreectl toggle --output clipboard
  • Check Status: Queries the daemon's current state once.

    handsfreectl status

    Possible outputs include Idle, Listening, Processing, Error, or Inactive.

  • Watch Status: Streams status updates in real-time. Efficient for status bars (like Waybar or Polybar) as it avoids polling.

    handsfreectl watch
  • Shutdown Daemon: Tells the handsfreed process to shut down cleanly.

    handsfreectl shutdown

Troubleshooting

  • handsfreectl status shows Inactive: The handsfreed daemon is not running or handsfreectl cannot find the communication socket (daemon.sock). Check the service status (systemctl --user status handsfree.service) and daemon logs (journalctl or the log file). Ensure socket paths match if configured manually.
  • handsfreectl shows Connection Error / Communication Error: Daemon might have crashed, or there might be permission issues with the socket file. Check daemon logs.
  • No transcription output: Check config.toml [output] commands are correct for your system (Wayland/X11) and the required tools (wtype, xdotool, wl-copy, xclip) are installed and in your $PATH. Check daemon logs for transcription or output errors.
  • VAD doesn't trigger / triggers too often / cuts off speech: Adjust parameters in the [vad] section of config.toml, particularly threshold, neg_threshold, min_silence_duration_ms, and pre_roll_duration_ms. Check daemon logs for VAD state transitions (enable DEBUG level).

License

This project is licensed under the GNU General Public License v3.0.

Acknowledgements

Handsfree would not exist without several fantastic open-source projects:

About

A local, real-time speech-to-text utility for Linux using Whisper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages