Skip to content

caskcsg/CtrlBenchRec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

CtrlBench-Rec is an evolutionary multi-agent framework with three modules: Initialization, Dynamic Interaction, and Collaborative Fusion. Operating as a closed-loop system, it iterates through initialization, policy alignment, and agent fusion to accelerate group exploration and cultivate elite agents. The central objective is to transform novice agents into a refined set of high-capability super probes that serve as a standardized benchmark for system controllability. The framework operates in two sequential phases: (1) Training phase, refining super probes through interaction and fusion; and (2) Inference and evaluation phase, deploying the probes for multi-dimensional controllability assessments.

Project Structure


├── data/                        # Datasets (ML-1M, preprocessed Amazon Toys & Games)
├── model/                       # Recommendation model definitions (e.g., SASRec, Narm, Qwen)
├── encoder/                     # Textual Encoder for Embedding Generation (e.g., twhin-bert)
├── generated_user_profile/      # User profiles generated at different stages
├── tool/                        # Data loaders and embedding processors
├── runner/                      # Scripts for training, inference, and evaluation (e.g., epoch.py, evaluation.py)
└── requirements.txt             # Project dependencies

Framework and workflow

FrameWork

Phase I: Evolutionary Training

  1. Multi-Agent Initialization :Extract static attributes and dynamic trajectories from raw datasets like ML1M.Instantiate agents with a profile expert, an LLM-based decision engine, and tool-calling modules.
  2. Environment Interaction & Behavior Alignment :Synchronize the Black-Box system's state with the agent's persona by injecting a continuous stream of profile-aligned interaction behaviors.
  3. Multi-Agent Strategy Fusion :Group agents via K-means clustering to facilitate intra-cluster discussions, followed by a fusion expert integrating these records and profiles into new Super Probes.

Phase II: Inference & Evaluation

  1. Interaction & Behavior Acquisition :Execute multi-turn interactions with the Black-Box recommender to generate a profile-aligned behavioral stream for the Super Probes.
  2. Systematic Evaluation

🚀 Quick Start

Prerequisites

Tool Version Description Check Installation
Python 3.10 Backend runtime python --version

Installation & Setup

1.Environment Configuration
Run the following command in your terminal to install the necessary dependencies:

pip install -r requirements.txt

2.Load Bert Encoder
Execute the script to load the twhin-bert encoder:

python runner/load_twhin_bert.py

3.Configuration (API Key)
To use the DeepSeek LLM features, you need to provide your API key from https://platform.deepseek.com/api_keys. You can pass it as an environment variable at runtime without permanently modifying your system settings.

For Linux / macOS / WSL Prefix your command with the variable:

DEEPSEEK_API_KEY="your_api_key_here" python ../runner/user_profile_initialize.py

For Windows (PowerShell) In PowerShell, variables must be set for the current session before running the script:

$env:DEEPSEEK_API_KEY="your_api_key_here"; python ../runner/user_profile_initialize.py

You can also download this model from:https://huggingface.co/Twitter/twhin-bert-base; After downloading,place it in the "../rec_models/" directory.

Experiments

We provide experiments using the SASRec recommendation model on the ML-1M dataset, centered around the Task 1 Target Content Discovery Analysis.

Phase I: Evolutionary Training

  1. Multi-Agent Initialization :Initialize the agent metadata
python runner/user_profile_initialize.py
  1. Interaction & Fusion :Update the entry point in epoch.py to call runner.epoch.sasrec_ml1m_merge, then run the script.
python runner/epoch.py

Phase II: Evolutionary Training

  1. Interaction & Behavior Acquisition :Update the entry point in epoch.py to call runner.epoch.sasrec_ml1m_debate_epoch20, then run the script.
python runner/epoch.py
  1. Systematic Evaluation : Invoke runner.evaluation.compare_two_profile, modify the original and evaluation profile paths, and run evaluation.py for results.
python runner/evaluation.py

Result

Following the experimental setup and evaluation metrics detailed in Section 5.2, "Controllability across different recommender system architectures," we conducted a series of experiments on the SASRec model. The results are presented below:

Interaction Rounds (t) MovieLens-1M (Coverage (%) ↑) MovieLens-1M (Exploration Efficiency ↓)
base(100) base_small (27) CtrlBench-Rec base(100) base_small (27) CtrlBench-Rec
t=5 5.6% 2.05% 2.33% 2.89 2.06 1.31
t=10 9.68% 3.91% 4.71% 2.98 1.84 1.39
t=15 12.20% 5.10% 7.23% 3.54 2.02 1.45
t=20 15.46% 6.56% 8.95% 3.78 2.16 1.58

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages