GitHub - caskcsg/CtrlBenchRec

Overview

CtrlBench-Rec is an evolutionary multi-agent framework with three modules: Initialization, Dynamic Interaction, and Collaborative Fusion. Operating as a closed-loop system, it iterates through initialization, policy alignment, and agent fusion to accelerate group exploration and cultivate elite agents. The central objective is to transform novice agents into a refined set of high-capability super probes that serve as a standardized benchmark for system controllability. The framework operates in two sequential phases: (1) Training phase, refining super probes through interaction and fusion; and (2) Inference and evaluation phase, deploying the probes for multi-dimensional controllability assessments.

Project Structure


├── data/                        # Datasets (ML-1M, preprocessed Amazon Toys & Games)
├── model/                       # Recommendation model definitions (e.g., SASRec, Narm, Qwen)
├── encoder/                     # Textual Encoder for Embedding Generation (e.g., twhin-bert)
├── generated_user_profile/      # User profiles generated at different stages
├── tool/                        # Data loaders and embedding processors
├── runner/                      # Scripts for training, inference, and evaluation (e.g., epoch.py, evaluation.py)
└── requirements.txt             # Project dependencies

Framework and workflow

Phase I: Evolutionary Training

Multi-Agent Initialization :Extract static attributes and dynamic trajectories from raw datasets like ML1M.Instantiate agents with a profile expert, an LLM-based decision engine, and tool-calling modules.
Environment Interaction & Behavior Alignment :Synchronize the Black-Box system's state with the agent's persona by injecting a continuous stream of profile-aligned interaction behaviors.
Multi-Agent Strategy Fusion :Group agents via K-means clustering to facilitate intra-cluster discussions, followed by a fusion expert integrating these records and profiles into new Super Probes.

Phase II: Inference & Evaluation

Interaction & Behavior Acquisition :Execute multi-turn interactions with the Black-Box recommender to generate a profile-aligned behavioral stream for the Super Probes.
Systematic Evaluation

🚀 Quick Start

Prerequisites

Tool	Version	Description	Check Installation
Python	3.10	Backend runtime	`python --version`

Installation & Setup

1.Environment Configuration
Run the following command in your terminal to install the necessary dependencies:

pip install -r requirements.txt

2.Load Bert Encoder
Execute the script to load the twhin-bert encoder:

python runner/load_twhin_bert.py

3.Configuration (API Key)
To use the DeepSeek LLM features, you need to provide your API key from https://platform.deepseek.com/api_keys. You can pass it as an environment variable at runtime without permanently modifying your system settings.

For Linux / macOS / WSL Prefix your command with the variable:

DEEPSEEK_API_KEY="your_api_key_here" python ../runner/user_profile_initialize.py

For Windows (PowerShell) In PowerShell, variables must be set for the current session before running the script:

$env:DEEPSEEK_API_KEY="your_api_key_here"; python ../runner/user_profile_initialize.py

You can also download this model from:https://huggingface.co/Twitter/twhin-bert-base; After downloading,place it in the "../rec_models/" directory.

Experiments

We provide experiments using the SASRec recommendation model on the ML-1M dataset, centered around the Task 1 Target Content Discovery Analysis.

Phase I: Evolutionary Training

Multi-Agent Initialization :Initialize the agent metadata

python runner/user_profile_initialize.py

Interaction & Fusion :Update the entry point in epoch.py to call runner.epoch.sasrec_ml1m_merge, then run the script.

python runner/epoch.py

Phase II: Evolutionary Training

Interaction & Behavior Acquisition :Update the entry point in epoch.py to call runner.epoch.sasrec_ml1m_debate_epoch20, then run the script.

python runner/epoch.py

Systematic Evaluation : Invoke runner.evaluation.compare_two_profile, modify the original and evaluation profile paths, and run evaluation.py for results.

python runner/evaluation.py

Result

Following the experimental setup and evaluation metrics detailed in Section 5.2, "Controllability across different recommender system architectures," we conducted a series of experiments on the SASRec model. The results are presented below:

Interaction Rounds (t)	MovieLens-1M (Coverage (%) ↑)			MovieLens-1M (Exploration Efficiency ↓)
Interaction Rounds (t)	base(100)	base_small (27)	CtrlBench-Rec	base(100)	base_small (27)	CtrlBench-Rec
t=5	5.6%	2.05%	2.33%	2.89	2.06	1.31
t=10	9.68%	3.91%	4.71%	2.98	1.84	1.39
t=15	12.20%	5.10%	7.23%	3.54	2.02	1.45
t=20	15.46%	6.56%	8.95%	3.78	2.16	1.58

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.container		.container
.github		.github
assets		assets
clock		clock
data		data
docs		docs
embeddings		embeddings
environment		environment
examples		examples
generated_user_post_cosine_similarities		generated_user_post_cosine_similarities
generated_user_profile/behavior_length_6_sasrec/behavior_length_6_sasrec		generated_user_profile/behavior_length_6_sasrec/behavior_length_6_sasrec
generator		generator
licenses		licenses
log		log
model		model
oasis		oasis
readme_images		readme_images
runner		runner
social_agent		social_agent
social_platform		social_platform
test		test
testing		testing
tool		tool
visualization		visualization
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
deploy.py		deploy.py
downloadModel.py		downloadModel.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Project Structure

Framework and workflow

🚀 Quick Start

Prerequisites

Installation & Setup

Experiments

Result

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Project Structure

Framework and workflow

🚀 Quick Start

Prerequisites

Installation & Setup

Experiments

Result

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages