SimuHome is a time-accelerated smart home simulator and benchmark for LLM-based agents, grounded in the Matter protocol. Device actions continuously affect environmental variables (e.g., temperature, humidity), and agents must reason over these changes. It also supports virtual-time workflow scheduling, so agents can queue multi-step actions for future execution and coordinate them with time-sensitive goals.
pip install uv
git clone https://github.com/holi-lab/SimuHome.git
cd SimuHome
uv syncCreate .env in the project root:
OPENAI_API_KEY=your_openai_key_here # Required
OPENROUTER_API_KEY=your_openrouter_key_here # Required for default eval_spec.example.yamlIf you use only local models (with api_key: null), you can omit OPENROUTER_API_KEY.
uv run simuhome server-start # Launch simulator on port 8000
uv run simuhome health # Verify it's runningThe current benchmark snapshot is pre-included in data/benchmark/.
Important
Benchmark version note
The benchmark included in data/benchmark/ is the current reproducible benchmark snapshot for this repository.
It is reproducible under the same seeds and settings within the current simulator version.
After the paper experiments, the simulator was enhanced for higher-fidelity smart-home simulation, and the benchmark episodes were regenerated based on this updated simulator. As a result, the current benchmark snapshot is not identical to the one used to produce Table 1 in the paper. Therefore, results obtained from the current repository should not be expected to exactly match the paper's reported numbers, even when using the same evaluation protocol, ReAct prompt, and model setup.
1. Configure eval_spec.example.yaml:
eval_spec.example.yaml
schema: simuhome-eval-spec-v1
run:
id: example_qt1_seed_1_3_5
output_root: experiments
episode:
dir: data/benchmark
qt: qt1
case: feasible
seed: "1 - 3, 5"
strategy:
name: react
timeout: 60
temperature: 0.0
max_steps: 20
orchestration:
max_workers: 2
simulator_start_timeout: 30
simulator_start_retries: 1
evaluation_retries: 1
allow_partial_start: true
api:
base: https://api.openai.com/v1
key: env:OPENAI_API_KEY
judge:
model: gpt-5-mini
api_base: https://api.openai.com/v1
api_key: env:OPENAI_API_KEY
models:
- model: openai/gpt-4.1
api_base: https://openrouter.ai/api/v1
api_key: env:OPENROUTER_API_KEY
- model: qwen3-30b-instruct
api_base: http://127.0.0.1:8000/v1
api_key: null2. Run:
uv run simuhome eval-start --spec eval_spec.example.yaml
uv run simuhome eval-resume --resume experiments/<run_id> # Resume if interrupted3. Aggregate results:
uv run simuhome aggregate --dir experiments/<run_id>/<model>
uv run simuhome aggregate-all --dir experiments/<run_id>Generate custom episodes beyond the included benchmark.
1. Configure gen_spec.example.yaml:
gen_spec.example.yaml
schema: simuhome-gen-spec-v1
run:
id: gen_example_qt1_seed_1_3_5
output_root: experiments
episode:
qt: qt1
case: feasible
seed: "1-3,5"
base_date: "2025-08-23"
home:
room_count: 5
devices_per_room:
min: 4
max: 7
environment:
temperature_c: # Temperature in Celsius (allowed: 18.0-36.0)
min: 22
max: 32
humidity_pct: # Humidity in percent (allowed: 15.0-85.0)
min: 35
max: 65
illuminance_lux: # Illuminance in lux (allowed: 10.0-3000.0)
min: 100
max: 1500
pm10_ugm3: # PM10 in ug/m3 (allowed: 5.0-120.0)
min: 20
max: 100
llm:
model: gpt-5-mini
api_base: https://openrouter.ai/api/v1
api_key: env:OPENROUTER_API_KEY
temperature: 1 # Sampling temperature (gpt-5-mini requires 1)2. Run:
uv run simuhome episode --spec gen_spec.example.yaml
uv run simuhome episode-resume --resume experiments/<run_id> # Resume if interrupted@inproceedings{
seo2026simuhome,
title={SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home {LLM} Agents},
author={Gyuhyeon Seo and Jungwoo Yang and Junseong Pyo and Nalim Kim and Jonggeun Lee and Yohan Jo},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=LCS1WsGvha}
}This project is licensed under CC BY-NC-ND 4.0.
You may share this work for non-commercial purposes with appropriate credit. Commercial use and derivative works are not permitted.

