mope

martingale off-policy evaluation

conda installation

conda env create -f environment.yml

opebet.py contains the code for MOPE (wealth_lb_2d) and its ablations:
- wealth_lb_1d: scalar betting
- wealth_2d: exact wealth maximization
- wealth_lb_2d_individual_qps: individual bets per value on a grid
opebetrp.py contains code for reward predictors and gated deployment
- wealth_lb_rp subtracts the reward predictor control variate from w*r
- wealth_lb_rp_double_hedge the double hedging strategy
- wealth_lb_gd confidence sequence for gated deployment

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
environments		environments
.gitignore		.gitignore
Betting-Coverage.ipynb		Betting-Coverage.ipynb
Betting-Width.ipynb		Betting-Width.ipynb
Gated-Deployment.ipynb		Gated-Deployment.ipynb
LICENSE		LICENSE
Mnist-Policies.ipynb		Mnist-Policies.ipynb
README.md		README.md
Reward-Predictor.ipynb		Reward-Predictor.ipynb
environment.yml		environment.yml
experiments.py		experiments.py
opebet.py		opebet.py
opebetrp.py		opebetrp.py