martingale off-policy evaluation
conda env create -f environment.yml
opebet.pycontains the code for MOPE (wealth_lb_2d) and its ablations:wealth_lb_1d: scalar bettingwealth_2d: exact wealth maximizationwealth_lb_2d_individual_qps: individual bets per value on a grid
opebetrp.pycontains code for reward predictors and gated deploymentwealth_lb_rpsubtracts the reward predictor control variate from w*rwealth_lb_rp_double_hedgethe double hedging strategywealth_lb_gdconfidence sequence for gated deployment