This is a Pytorch implementation for our paper on
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning, ICLR 2022.
Install the package of rlkit with
cd d4rl
pip install -e .
For running PBRL on the MuJoCo environments, run:
python examples/pevi_mujoco.py --env walker2d-medium-v2 --gpu 0
For running PBRL-Prior on the MuJoCo environments, run:
python examples/pevi_mujoco.py --env walker2d-medium-v2 --prior --gpu 0
For running PBRL on the Adroit environments, run:
python examples/pevi_adroit.py --env pen-cloned-v0 --gpu 0
For running PBRL-Prior on the Adroit environments, run:
python examples/pevi_adroit.py --env pen-cloned-v0 --prior --gpu 0
The core implementation is given in d4rl/rlkit/torch/sac/pevi.py
The data for separate runs is stored on disk under the result directory with filename <env-id>-<timestamp>/<seed>/. Each run directory contains
debug.logRecord the epoch, Q-value, Uncertainty-value, scores.progress.csvSame data asdebug.logbut with csv format.variant.jsonThe hyper-parameters in training.modelsThe final actor-critic network.
The evaluation/d4rl score in debug.log or progress.csv records the normalized score in our paper.
In case of any questions, bugs, suggestions or improvements, please feel free to open an issue.