Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 1 addition & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Supports announcments, start/continue conversation, and timers.

Install system dependencies (`apt-get`):

* `libportaudio2` or `portaudio19-dev` (for `sounddevice`)
* `libportaudio2` (for `sounddevice`)
* `build-essential` (for `pymicro-features`)
* `libmpv-dev` (for `python-mpv`)

Expand All @@ -25,25 +25,8 @@ script/setup

Use `script/run` or `python3 -m linux_voice_assistant`

You must specify `--name <NAME>` with a name that will be available in Home Assistant.

See `--help` for more options.

### Microphone

Use `--audio-input-device` to change the microphone device. Use `python3 -m sounddevice` to see the available PortAudio devices.

The microphone device **must** support 16Khz mono audio.

### Speaker

Use `--audio-output-device` to change the speaker device. Use `mpv --audio-device=help` to see the available MPV devices.

## Wake Word

Change the default wake word with `--wake-model <id>` where `<id>` is the name of a model in the `wakewords` directory. For example, `--wake-model hey_jarvis` will load `wakewords/hey_jarvis.tflite` by default.


## Connecting to Home Assistant

1. In Home Assistant, go to "Settings" -> "Device & services"
Expand Down
73 changes: 73 additions & 0 deletions linux_voice_assistant.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
Metadata-Version: 2.4
Name: linux-voice-assistant
Version: 1.0.0
Summary: Linux voice assistant for Home Assistant using the ESPHome protocol
Author-email: The Home Assistant Authors <hello@home-assistant.io>
License: Apache-2.0
Project-URL: Source Code, http://github.com/OHF-Voice/linux-voice-assistant
Keywords: home,assistant,voice,esphome,linux
Platform: any
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.9.0
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: aioesphomeapi==37.2.1
Requires-Dist: sounddevice<1
Requires-Dist: numpy<3,>=2
Requires-Dist: pymicro-features==1.0.0
Provides-Extra: dev
Requires-Dist: black==24.8.0; extra == "dev"
Requires-Dist: flake8==7.2.0; extra == "dev"
Requires-Dist: mypy==1.14.0; extra == "dev"
Requires-Dist: pylint==3.2.7; extra == "dev"
Requires-Dist: pytest==8.3.5; extra == "dev"
Dynamic: license-file

# Linux Voice Assistant

Experimental Linux voice assistant for [Home Assistant][homeassistant] that uses the [ESPHome][esphome] protocol.

Runs on Linux `aarch64` and `x86_64` platforms. Tested with Python 3.13 and Python 3.11.
Supports announcments, start/continue conversation, and timers.

## Installation

Install system dependencies (`apt-get`):

* `libportaudio2` (for `sounddevice`)
* `build-essential` (for `pymicro-features`)
* `libmpv-dev` (for `python-mpv`)

Clone and install project:

``` sh
git clone https://github.com/OHF-Voice/linux-voice-assistant.git
cd linux-voice-assistant
script/setup
```

## Running

Use `script/run` or `python3 -m linux_voice_assistant`

See `--help` for more options.

## Connecting to Home Assistant

1. In Home Assistant, go to "Settings" -> "Device & services"
2. Click the "Add integration" button
3. Choose "ESPHome" and then "Set up another instance of ESPHome"
4. Enter the IP address of your voice satellite with port 6053
5. Click "Submit"

<!-- Links -->
[homeassistant]: https://www.home-assistant.io/
[esphome]: https://esphome.io/
19 changes: 19 additions & 0 deletions linux_voice_assistant.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
LICENSE.md
README.md
pyproject.toml
setup.cfg
linux_voice_assistant/__init__.py
linux_voice_assistant/__main__.py
linux_voice_assistant/api_server.py
linux_voice_assistant/entity.py
linux_voice_assistant/event_bus.py
linux_voice_assistant/event_led.py
linux_voice_assistant/microwakeword.py
linux_voice_assistant/mpv_player.py
linux_voice_assistant/util.py
linux_voice_assistant.egg-info/PKG-INFO
linux_voice_assistant.egg-info/SOURCES.txt
linux_voice_assistant.egg-info/dependency_links.txt
linux_voice_assistant.egg-info/requires.txt
linux_voice_assistant.egg-info/top_level.txt
linux_voice_assistant.egg-info/zip-safe
1 change: 1 addition & 0 deletions linux_voice_assistant.egg-info/dependency_links.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

11 changes: 11 additions & 0 deletions linux_voice_assistant.egg-info/requires.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
aioesphomeapi==37.2.1
sounddevice<1
numpy<3,>=2
pymicro-features==1.0.0

[dev]
black==24.8.0
flake8==7.2.0
mypy==1.14.0
pylint==3.2.7
pytest==8.3.5
1 change: 1 addition & 0 deletions linux_voice_assistant.egg-info/top_level.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
linux_voice_assistant
1 change: 1 addition & 0 deletions linux_voice_assistant.egg-info/zip-safe
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

85 changes: 70 additions & 15 deletions linux_voice_assistant/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,13 @@
from .api_server import APIServer
from .entity import ESPHomeEntity, MediaPlayerEntity
from .microwakeword import MicroWakeWord
from .openwakeword_client import WyomingWakeClient
from .mpv_player import MpvMediaPlayer
from .util import call_all, get_mac, is_arm

from .event_bus import EventBus
from .event_led import LedEvent

_LOGGER = logging.getLogger(__name__)
_MODULE_DIR = Path(__file__).parent
_REPO_DIR = _MODULE_DIR.parent
Expand Down Expand Up @@ -78,8 +82,12 @@ class ServerState:
tts_player: MpvMediaPlayer
wakeup_sound: str
timer_finished_sound: str
loop: asyncio.AbstractEventLoop
event_bus: EventBus
media_player_entity: Optional[MediaPlayerEntity] = None
satellite: "Optional[VoiceSatelliteProtocol]" = None
wyoming_wake: Optional[WyomingWakeClient] = None
use_wyoming_wake: bool = False


# -----------------------------------------------------------------------------
Expand Down Expand Up @@ -110,11 +118,16 @@ def __init__(self, state: ServerState) -> None:
self._continue_conversation = False
self._timer_finished = False

self.state.event_bus.publish('ready', {})
_LOGGER.info('System is ready!')

def handle_voice_event(
self, event_type: VoiceAssistantEventType, data: Dict[str, str]
) -> None:
_LOGGER.debug("Voice event: type=%s, data=%s", event_type.name, data)

self.state.event_bus.publish(f'voice_{event_type.name}', data)

if event_type == VoiceAssistantEventType.VOICE_ASSISTANT_RUN_START:
self._tts_url = data.get("url")
self._tts_played = False
Expand Down Expand Up @@ -157,6 +170,7 @@ def handle_timer_event(
self._play_timer_finished()

def handle_message(self, msg: message.Message) -> Iterable[message.Message]:
_LOGGER.debug(f'message {msg.__name__}')
if isinstance(msg, VoiceAssistantEventResponse):
# Pipeline event
data: Dict[str, str] = {}
Expand Down Expand Up @@ -197,14 +211,11 @@ def handle_message(self, msg: message.Message) -> Iterable[message.Message]:
| VoiceAssistantFeature.TIMERS
),
)
elif isinstance(
msg,
(
ListEntitiesRequest,
SubscribeHomeAssistantStatesRequest,
MediaPlayerCommandRequest,
),
):
elif isinstance(msg, (
ListEntitiesRequest,
SubscribeHomeAssistantStatesRequest,
MediaPlayerCommandRequest,
),):
for entity in self.state.entities:
yield from entity.handle_message(msg)

Expand Down Expand Up @@ -245,13 +256,13 @@ def handle_message(self, msg: message.Message) -> Iterable[message.Message]:
break

def handle_audio(self, audio_chunk: bytes) -> None:

if not self._is_streaming_audio:
return

self.send_messages([VoiceAssistantAudio(data=audio_chunk)])

def wakeup(self) -> None:
# Why are we stopping the timer? Wouldn't it be better to delay it?
if self._timer_finished:
# Stop timer instead
self._timer_finished = False
Expand All @@ -264,6 +275,10 @@ def wakeup(self) -> None:
self.send_messages(
[VoiceAssistantRequest(start=True, wake_word_phrase=wake_word_phrase)]
)

self.state.event_bus.publish('voice_wakeword', {'wake_word_phrase': wake_word_phrase})


self.duck()
self._is_streaming_audio = True
self.state.tts_player.play(self.state.wakeup_sound)
Expand All @@ -286,6 +301,8 @@ def play_tts(self) -> None:
self._tts_played = True
_LOGGER.debug("Playing TTS response: %s", self._tts_url)

self.state.event_bus.publish('voice_play_tts', {})

self.state.stop_word.is_active = True
self.state.tts_player.play(self._tts_url, done_callback=self._tts_finished)

Expand All @@ -301,6 +318,9 @@ def _tts_finished(self) -> None:
self.state.stop_word.is_active = False
self.send_messages([VoiceAssistantAnnounceFinished()])

# Actual time the TTS stops speaking
self.state.event_bus.publish('voice__tts_finished', {})

if self._continue_conversation:
self.send_messages([VoiceAssistantRequest(start=True)])
self._is_streaming_audio = True
Expand Down Expand Up @@ -328,6 +348,9 @@ def connection_lost(self, exc):


def process_audio(state: ServerState):
# debug counters
chunks_sent = 0
last_log = 0.0

try:
while True:
Expand All @@ -340,11 +363,18 @@ def process_audio(state: ServerState):

try:
state.satellite.handle_audio(audio_chunk)

if state.wake_word.is_active and state.wake_word.process_streaming(
audio_chunk
):
state.satellite.wakeup()
chunks_sent += 1
if state.use_wyoming_wake and chunks_sent % 100 == 0:
_LOGGER.debug("[OWW] chunks_sent=%d (approx %.1fs of audio)", chunks_sent, chunks_sent*0.064)

if state.use_wyoming_wake:
if state.wyoming_wake:
state.wyoming_wake.send_audio_chunk(audio_chunk)
else:
if state.wake_word.is_active and state.wake_word.process_streaming(
audio_chunk
):
state.satellite.wakeup()

if state.stop_word.is_active and state.stop_word.process_streaming(
audio_chunk
Expand All @@ -363,6 +393,8 @@ def process_audio(state: ServerState):
async def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--name", required=True)
parser.add_argument("--wake-uri", help="Wyoming wake server URI (e.g., tcp://127.0.0.1:10400)")
parser.add_argument("--wake-word-name", help="Wake word name on the Wyoming server (e.g., hal)")
parser.add_argument(
"--audio-input-device",
default="default",
Expand Down Expand Up @@ -401,6 +433,12 @@ async def main() -> None:
logging.basicConfig(level=logging.DEBUG if args.debug else logging.INFO)
_LOGGER.debug(args)

use_wyoming = bool(args.wake_uri and args.wake_word_name)
wyoming_client = None
if use_wyoming:
_LOGGER.info("Using Wyoming openWakeWord at %s (name=%s)", args.wake_uri, args.wake_word_name)
wyoming_client = WyomingWakeClient(args.wake_uri, args.wake_word_name)

# Load available wake words
wake_word_dir = Path(args.wake_word_dir)
available_wake_words: Dict[str, AvailableWakeWord] = {}
Expand Down Expand Up @@ -432,6 +470,8 @@ async def main() -> None:
stop_config_path = wake_word_dir / f"{args.stop_model}.json"
_LOGGER.debug("Loading stop model: %s", stop_config_path)
stop_model = MicroWakeWord.from_config(stop_config_path, libtensorflowlite_c_path)

loop = asyncio.get_running_loop()

state = ServerState(
name=args.name,
Expand All @@ -441,12 +481,28 @@ async def main() -> None:
available_wake_words=available_wake_words,
wake_word=wake_model,
stop_word=stop_model,
event_bus=EventBus(),
loop=loop,
music_player=MpvMediaPlayer(device=args.audio_output_device),
tts_player=MpvMediaPlayer(device=args.audio_output_device),
wakeup_sound=args.wakeup_sound,
timer_finished_sound=args.timer_finished_sound,
wyoming_wake=wyoming_client,
use_wyoming_wake=use_wyoming,
)

LedEvent(state)

# Connect to Wyoming wake server if enabled
if state.use_wyoming_wake and state.wyoming_wake:
def _on_detect(_name, _ts):
_LOGGER.debug("[OWW] detection callback fired name=%s ts=%s", _name, _ts)
if state.satellite is not None:
state.loop.call_soon_threadsafe(lambda: state.satellite.wakeup())
state.wyoming_wake.connect(_on_detect)
state.wake_word.is_active = False
_LOGGER.debug("[OWW] Local MicroWakeWord disabled; streaming audio to Wyoming")

process_audio_thread = threading.Thread(
target=process_audio, args=(state,), daemon=True
)
Expand All @@ -455,7 +511,6 @@ async def main() -> None:
def sd_callback(indata, _frames, _time, _status):
state.audio_queue.put_nowait(bytes(indata))

loop = asyncio.get_running_loop()
server = await loop.create_server(
lambda: VoiceSatelliteProtocol(state), host=args.host, port=args.port
)
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading