Skip to content

dfy37/lifesim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

121 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

The rapid advancement of large language models (LLMs) has accelerated progress toward universal AI assistants. However, existing benchmarks for personalized assistants remain misaligned with real-world user-assistant interactions, failing to capture the complexity of external contexts and users' cognitive states.

To bridge this gap, we propose LifeSim, a user simulator that models user cognition through the Belief-Desire-Intention (BDI) model within physical environments for coherent life trajectories generation, and simulates intention-driven user interactive behaviors. Based on LifeSim, we introduce LifeSim-Eval, a comprehensive benchmark for multi-scenario, long-horizon personalized assistance.

LifeSim Framework

LifeSim Demo


Table of Contents


Quick Start

1. Environment Setup

Create a conda environment and install dependencies:

conda create -n lifesim python=3.10.12
conda activate lifesim
pip install -r requirements.txt

Note: The provided requirements.txt is comprehensive and includes GPU/vLLM-related packages. For a lightweight setup (e.g., API-only models without local GPU inference), you only need the core packages: openai, chromadb, sentence-transformers, flask, flask-cors, pyyaml, tqdm, numpy.


2. Data Preparation

The pipeline expects the following directory layout under data/:

data/
├── single_session/
│   ├── events.jsonl          # Event sequences 
│   └── users.jsonl           # User profiles
├── long_horizon/
│   ├── events.jsonl          # Event sequences 
│   └── users.jsonl           # User profiles
└── language_templates.json   # Preference dimension templates

3. Model Setup

LifeSim requires two LLM backends — a user model (simulates the user) and an assistant model (the AI system under evaluation) — plus an embedding model for the retrieval memory.

Option A: Local Models via vLLM

Launch vLLM servers for the user and assistant models separately. For example:

# Launch user model (e.g., Qwen3-32B)
CUDA_VISIBLE_DEVICES=0,1 vllm serve /path/to/Qwen3-32B \
  --host 0.0.0.0 --port 8001 \
  --tensor-parallel-size 2 \
  --api-key your_api_key

# Launch assistant model (e.g., Llama-3-70B)
CUDA_VISIBLE_DEVICES=2,3 vllm serve /path/to/Llama-3-70B \
  --host 0.0.0.0 --port 8002 \
  --tensor-parallel-size 2 \
  --api-key your_api_key

Option B: Cloud API Models

No local deployment is needed. Supported model names (passed via --assistant_model_path):

Provider Model Name
OpenAI gpt-4o, gpt-4o-mini, gpt-5, gpt-5-mini
DeepSeek deepseek-chat, deepseek-reasoner
Anthropic claude-sonnet-4-5-20250929

Pass the corresponding API key via --assistant_model_api_key.

Embedding Model

An embedding model is required for the ChromaDB-based retrieval memory. We recommend Qwen3-Embedding-0.6B (or any model compatible with sentence-transformers). Download it to a local path and pass it via --retriever_model_path.


4. Run Simulation

Use src/main_mp.py as the main entrypoint. All scripts are run from the repository root with PYTHONPATH=src.

PYTHONPATH=src python src/main_mp.py \
  --user_model_path      /path/to/User_Model \
  --user_model_url       your_user_model_url \
  --user_model_api_key   your_api_key \
  --assistant_model_path /path/to/Assistant_Model \
  --assistant_model_url  your_assistant_model_url \
  --assistant_model_api_key your_api_key \
  --retriever_model_path /path/to/Qwen3-Embedding-0.6B \
  --exp_setting          single_session \
  --n_events_per_sequence 10 \
  --n_threads            4 \
  --chromadb_root        ./chromadb \
  --logs_root            ./logs

5. Evaluation

Evaluation is a two-step pipeline using an LLM-as-a-judge approach across 7 dimensions:

Metric Type Description
ir Intent Recognition Whether the assistant correctly identifies the user's intent
ic Intent Completion Whether the assistant's reply fulfills each intent dimension
nat Naturalness Fluency and conversational naturalness (1–5)
coh Coherence Logical consistency and contextual continuity (1–5)
pa Preference Alignment Whether the reply aligns with the user's preference profile
ea Environment Alignment Scene/environment feasibility and constraint awareness (1–5)
rr Rigid Reasoning Binary flag for failure to adapt after new constraints

Step 1 — Generate LLM judge outputs (eval.py)

Run once per evaluator model. Results are saved under ./eval_outputs/{evaluator}/{theme}/.

for EVALUATOR in qwen3_32b; do
  PYTHONPATH=src python src/evaluation/eval.py \
    --logs_root   ./logs \
    --themes      main_user_Qwen3-32B_assistant_Qwen3-8B_total \
    --output_root ./eval_outputs/${EVALUATOR} \
    --evaluator   ${EVALUATOR} \
    --model_path  /path/to/evaluator_model \
    --base_url    http://0.0.0.0:8000/v1 \
    --api_key     your_api_key \
    --metrics     ir ic nat coh pa ea rr \
    --max_workers 32
done

Step 2 — Aggregate numeric scores (metric.py)

Pass all evaluators to --evaluators; scores are averaged across them automatically.

PYTHONPATH=src python src/evaluation/metric.py \
  --results_root ./eval_outputs \
  --models       main_user_Qwen3-32B_assistant_Qwen3-8B_total \
  --evaluators   qwen3_32b \
  --metrics      ir ic nat coh pa ea rr \
  --output_root  ./metric_outputs

Scores are printed to stdout and saved as {output_root}/{model}/scores.json.


Web Demo

The web demo provides an interactive UI for two usage modes:

  • Live Generation — dynamically generates life events driven by the BDI model and lets you interact directly with the simulated user in real time.
  • Preset Demo — replays pre-generated trajectory data with an animated map timeline; click any node to view event details and chat with the simulated user.

Step 1: Create config.yaml

A full annotated template is provided at demo/config_template.yaml. Copy it and fill in your model paths, API keys, and retriever settings. The config is only required for the Live Generation mode (real-time user-model chat); the Preset Demo mode works without it.

Step 2: Launch the Server

cd /path/to/lifesim/demo

python app.py \
  --events-path /path/to/data/single_session/events.jsonl \
  --users-path  /path/to/data/single_session/users.jsonl \
  --config      /path/to/config.yaml \
  --port        5020

Then open http://localhost:5020 in your browser.

Citation

If you use LifeSim or LifeSim-Eval in your research, please cite:

@misc{duan2026lifesimlonghorizonuserlife,
  title={LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation}, 
  author={Feiyu Duan and Xuanjing Huang and Zhongyu Wei},
  year={2026},
  eprint={2603.12152},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.12152}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages