Official repository for "Direct Advantage Estimation"
We recommend using Python 3.8 with venv. Please make sure pip is up to date by running:
pip install -U pipInstall requirements:
pip install -r requirements.txtTo reproduce the results, run the following command:
python train.py --algo {algo} --hparam_file {hyperparameter_file} --envs {env} --threads {threads}--algo: PPO (GAE) or CustomPPO (DAE)
--hparam_file: See ./params/ for the hyperparameters used in the paper, the files are named by {algo}_{network}.yml
--envs: Environment to train. For example, Pong, Breakout, etc. For MinAtar environments, please add the suffix -MinAtar-v0. (e.g., Breakout-MinAtar-v0)
--threads: Number of parallel threads for asynchronous environment steps
--logging: Save logs in ./logs/{env}/
--save_model: Save the trained model to ./logs/{env}/
To view the tensorboard logs, run
python -m tensorboard --logdir ./logs/and open the displayed URL in a browser.
Please use the following BibTex entry.
@article{pan2022direct,
title={Direct advantage estimation},
author={Pan, Hsiao-Ru and G{\"u}rtler, Nico and Neitz, Alexander and Sch{\"o}lkopf, Bernhard},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={11869--11880},
year={2022}
}