[ICLR26 Oral] SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

SimuHome is a time-accelerated smart home simulator and benchmark for LLM-based agents, grounded in the Matter protocol. Device actions continuously affect environmental variables (e.g., temperature, humidity), and agents must reason over these changes. It also supports virtual-time workflow scheduling, so agents can queue multi-step actions for future execution and coordinate them with time-sensitive goals.

Quick Start

1. Install

pip install uv
git clone https://github.com/holi-lab/SimuHome.git
cd SimuHome
uv sync

2. Set API Keys

Create .env in the project root:

OPENAI_API_KEY=your_openai_key_here            # Required
OPENROUTER_API_KEY=your_openrouter_key_here    # Required for default eval_spec.example.yaml

If you use only local models (with api_key: null), you can omit OPENROUTER_API_KEY.

3. Start the Simulator

uv run simuhome server-start   # Launch simulator on port 8000
uv run simuhome health         # Verify it's running

Evaluation

The current benchmark snapshot is pre-included in data/benchmark/.

Important

Benchmark version note
The benchmark included in data/benchmark/ is the current reproducible benchmark snapshot for this repository. It is reproducible under the same seeds and settings within the current simulator version.

After the paper experiments, the simulator was enhanced for higher-fidelity smart-home simulation, and the benchmark episodes were regenerated based on this updated simulator. As a result, the current benchmark snapshot is not identical to the one used to produce Table 1 in the paper. Therefore, results obtained from the current repository should not be expected to exactly match the paper's reported numbers, even when using the same evaluation protocol, ReAct prompt, and model setup.

1. Configure eval_spec.example.yaml:

eval_spec.example.yaml

schema: simuhome-eval-spec-v1

run:
  id: example_qt1_seed_1_3_5
  output_root: experiments

episode:
  dir: data/benchmark
  qt: qt1
  case: feasible
  seed: "1 - 3, 5"

strategy:
  name: react
  timeout: 60
  temperature: 0.0
  max_steps: 20

orchestration:
  max_workers: 2
  simulator_start_timeout: 30
  simulator_start_retries: 1
  evaluation_retries: 1
  allow_partial_start: true

api:
  base: https://api.openai.com/v1
  key: env:OPENAI_API_KEY

judge:
  model: gpt-5-mini
  api_base: https://api.openai.com/v1
  api_key: env:OPENAI_API_KEY

models:
  - model: openai/gpt-4.1
    api_base: https://openrouter.ai/api/v1
    api_key: env:OPENROUTER_API_KEY

  - model: qwen3-30b-instruct
    api_base: http://127.0.0.1:8000/v1
    api_key: null

2. Run:

uv run simuhome eval-start --spec eval_spec.example.yaml
uv run simuhome eval-resume --resume experiments/<run_id>    # Resume if interrupted

3. Aggregate results:

uv run simuhome aggregate --dir experiments/<run_id>/<model>
uv run simuhome aggregate-all --dir experiments/<run_id>

Episode Generation

Generate custom episodes beyond the included benchmark.

1. Configure gen_spec.example.yaml:

gen_spec.example.yaml

schema: simuhome-gen-spec-v1

run:
  id: gen_example_qt1_seed_1_3_5
  output_root: experiments

episode:
  qt: qt1
  case: feasible
  seed: "1-3,5"
  base_date: "2025-08-23"
  home:
    room_count: 5
    devices_per_room:
      min: 4
      max: 7
    environment:
      temperature_c:  # Temperature in Celsius (allowed: 18.0-36.0)
        min: 22
        max: 32
      humidity_pct:  # Humidity in percent (allowed: 15.0-85.0)
        min: 35
        max: 65
      illuminance_lux:  # Illuminance in lux (allowed: 10.0-3000.0)
        min: 100
        max: 1500
      pm10_ugm3:  # PM10 in ug/m3 (allowed: 5.0-120.0)
        min: 20
        max: 100

llm:
  model: gpt-5-mini
  api_base: https://openrouter.ai/api/v1
  api_key: env:OPENROUTER_API_KEY
  temperature: 1  # Sampling temperature (gpt-5-mini requires 1)

2. Run:

uv run simuhome episode --spec gen_spec.example.yaml
uv run simuhome episode-resume --resume experiments/<run_id>  # Resume if interrupted

Citation

@inproceedings{
  seo2026simuhome,
  title={SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home {LLM} Agents},
  author={Gyuhyeon Seo and Jungwoo Yang and Junseong Pyo and Nalim Kim and Jonggeun Lee and Yohan Jo},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=LCS1WsGvha}
}

License

This project is licensed under CC BY-NC-ND 4.0.
You may share this work for non-commercial purposes with appropriate credit. Commercial use and derivative works are not permitted.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.githooks		.githooks
assets/figures		assets/figures
data		data
docs/clusters		docs/clusters
prompts		prompts
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
eval_spec.example.yaml		eval_spec.example.yaml
gen_spec.example.yaml		gen_spec.example.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR26 Oral] SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Quick Start

1. Install

2. Set API Keys

3. Start the Simulator

Evaluation

Episode Generation

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICLR26 Oral] SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Quick Start

1. Install

2. Set API Keys

3. Start the Simulator

Evaluation

Episode Generation

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages