forked from opendilab/DI-engine
-
Notifications
You must be signed in to change notification settings - Fork 0
Merge main DI-engine #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
karroyan
wants to merge
314
commits into
karroyan:wandb
Choose a base branch
from
opendilab:main
base: wandb
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* feature(nyz): setup evogym docker * fix(nyz): correct evogym docker
* feature(nyz): add bipedalwalker demo and fix rounding bug * feature(nyz): add seed_api option and ch3 config * fix(nyz): fix env clone bug, add attention policy * polish(nyz): add cloudpickle clone and fix minor bugs * feature(nyz): add evogym env config
* embed metadrive into di-enginej * adapt to the latest version of metadrive * adapt to metadrive * adapt metadrive feature * change decision frequency * fix some problems * add show bird view observation * add metadrive in readme.md
* Unsqueeze action_args in PDQN when its shape is 1 * Parameterize PDQN unittest
overide -> override (ci skip)
* polish(pu): polish icm_onppo_config * polish(pu): polish icm rnd intrinsic_reward_weight and config * style(pu): yapf format * fix(lisong): fix config bugs and app_key env bugs * polish(lisong): polish icm/rnd config and reward model * fix(lisong): add viewsizerapper in minigrid_wrapper * fix(lisong): add doorkey8x8 rnd+onppo config,save reward model, fix replay save * fix(pu): fix augmented_reward tb_logging * feat(lisong): add noisy-tv env in minigrid * fix(lisong): modify noisy_tv env
* polish ppof rewardclip and add atari config * polish config * polish style * polish config and ppof * polish style * polish yapf style
* draft runable verison for mdqn and config file * fix style for mdqn * fix style for mdqn * update action_gap part for mdqn * provide tau and alpha * draft runable verison for mdqn and config file * fix style for mdqn * fix style for mdqn * update action_gap part for mdqn * provide tau and alpha * add clipfrac to mdqn * add unit test for mdqn td error * provide current exp parameter * fix bug in mdqn td loss function and polish code * revert useless change in dqn * update readme for mdqn * delete wring named folder * rename asterix folder * provide resonable config for asterix * fix style and unit test * polish code under comment * fix typo in dizoo asterix config * fix style * fix style * provide is_dynamic_seed for collector env * add unit test for mdqn in test_serial_entry with asterix * change test for mdqn from asterix to cartpole because of platform test failed * change is_dynamic structure because of unit test failed at test entry * add comment for is_dynamic_seed * add enduro and spaceinvaders mdqn config file && polish comments * polish code under comment
…input (#451) * feature(rjy): add rocket env * feature(rjy): add rocket env and config * fix(rjy): waiting for cuda fix * config(rjy): add dmc2gym _sac_state_config * config(rjy): modify max_env_step * config(rjy): max_step * feature(rjy): add qac_pixel model * fix(rjy): modify qacpixel model * fix(rjy): waiting to merge * fix(rjy): change config * fix(rjy): modify config and qac * fix(rjy): rm rocket env * fix(pu): fix seletion of default_model in sac and qac_pixel template model * polish(pu): delete one redundant linear layer in qac_pixel * polish(pu): add atari-like wrapper option for dmc2gym_env, add share_conv_encoder and embed_action option for qac_pixel model * polish(pu): polish dmc swingup sac config and yapf format * polish(pu): add option of embed_action_density and fix sac when two Q share conv encoder * polish(rjy): docker part * test(rjy): add test for QACPixel model * polish(rjy): modify network para
* feature(pu): add pong and cartpole ddp config of dqn and onppo * fix(pu):fix atari_ppo_ddp.py * polish(pu): polish atari_dqn_ddp.py and atari_ppo_ddp.py * polish(pu): polish atari ddp configs
* Add IQL algo * Polish IQL Algorithm * polish iql
* polish(pu): delete unused enable_fast_timestep argument * polish(pu): delete unused empty lines * polish(pu): delete unused empty lines * style(pu): polish comment's format * style(pu): polish comment's format
* feature(nyz): add rlhf dataset * fix(nyz): fix import bugs * feature(nyz): add vision input support and fix bugs * style(nyz): add comments for rlhf dataset
* test(nyz): polish ppo and add rlhf ppo loss test * interface(nyz): add naive interface about grpo/rloo * test&implement(dcy): add unit tests for GRPO and RLOO - Add test_grpo_rlhf.py for GRPO unit tests - Add test_rloo_rlhf.py for RLOO unit tests - Update GRPO implementation - Update RLOO implementation * polish(dcy): polish grpo and rloo and test unit * (dcy) rloo and grpo * (dcy) redesign avd from reward * (dcy) Polish style:Use selective log-softmax to reduce peak vram consumption * (dcy)small changes * (dcy)git add readme and typing * (dcy) English comment file name and function name changed
… jericho config (#860) * feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config * polish(pu): polish arguments and docstring * fix(pu): add wrongly deleted code snippet * style(pu): yapf format * style(pu): flake8 format * polish(pu): polish docstring, add allreduce_with_indicator method * style(pu): polish type lint * polish(pu): polish allreduce_with_indicator * fix(pu): fix wrongly deleted snippet * fix(pu): fix ding.utils import * style(pu): flake8 format * fix(pu): fix action_mask bug in vac * fix(pu): fix critic_embedding bug in vac, polish some cfgs * style(pu): yapf format * fix(pu): fix self._collector_envstep * style(pu): yapf format
* fix(pu): fix noise layer's usage * polish(pu): polish comments * polish(pu): polish noisy_net config * fix(pu): fix reset_noise bug in noisy_net option * fix(pu): fix enable_noise bug in rainbow * style(pu): yapf format * style(pu): yapf format * style(pu): flake8 format * style(pu): yapf format * polish(pu): polish set_noise_mode when self._cfg.noisy_net is False * fature(pu): add unittest for noise_linear_layer --------- Co-authored-by: puyuan <puyuan1996@qq.com>
* feature(wyx): add three KL-divergence variants * fix bugs and add description for KL-divergence variants * add a period for each comment line * fix bugs and add KL-divergence parameter descriptions * fix flake8 E501 error in ppo.py docstring * fix trailing whitespace in ppo.py
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Related Issue
TODO
Check List