Skip to content

[ACL'25] Code for ACL'25 paper "IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory"

Notifications You must be signed in to change notification settings

Not-Diamond/IRT-Router

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IRT-Router

This repo contains the logic to train and evaluate an IRT-Router. For an SAP Routing PoC we compared the performance of this technique, specifically the MIRT-BERT flavor, against our own ND custom router performance.

We would build a routing dataset that's compatible with the ND router ahead of time and find the pareto and baseline optima frontiers, then analyze those in the context of this router.

The main contributions to this repo for ND are data marshalling and automating evals.

Using this code

  1. Run python3.12 -m venv venv && source venv/bin/activate && pip3 install -r requirements-frozen.txt to setup the environment as previously validated.
  2. Look at ./router_train_eval_e2e.sh. This bash script pipelines the entire router training and evaluation process. All of the utilities can be used on their own.
  3. To set up a new run, create a 'my_dataset' folder. Name this folder whatever you want.
  4. Inside 'my_dataset' create 'bert_embeddings' and 'frontiers' folders.
  5. Inside of 'frontiers' copy the 'baseline_frontier.json' and 'pareto_frontier.json' files from ND CR router training and name then exactly like this.
  6. Rename your ND router compatible training dataset to 'routing_dataset_merged.csv' and put it inside the 'my_dataset' folder.
  7. With these files in their correct locations, simply run router_train_eval_e2e.sh 'my_dataset' and the pipeline can kick off.
  8. Final results should be in 'my_dataset/sweep.log' and 'my_dataset/comparison_plog.log'.

Oracle and Naive router

  1. Run ./venv/bin/python3.12 naive_routing_analysis.py --input-file ./my_dataset/routing_data_reformatted.csv to see the statistical breakdown of the routing dataset.

j5 12/16/25 this is the original content of the README

Experiments

Training

We provide training data in the following file:

  • data/train.csv: Training dataset.

You can train the M-IRT router using the following command:

python train_mirt.py

Similarly, to train the N-IRT router, run:

python train_nirt.py

We also provide a trained model checkpoint:

  • mirt_bert.snapshot: uses bert-base-uncased as embedding model.

Testing

  • data/test1.csv: In-distribution test set.
  • data/test2.csv: Out-of-distribution test set.

To evaluate the M-IRT router on the in-distribution test set, use the following command:

python test_router.py --router mirt --emb_name bert --test_path test1 --a 0.8 --lamda 0.3

Alternatively, you can execute the pre-written script:

sh test.sh

Router Usage

To be continued…

About

[ACL'25] Code for ACL'25 paper "IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.4%
  • Shell 3.6%