TRPO example that reproduces the "Deep Reinforcement Learning that Matters" paper by muupan · Pull Request #449 · chainer/chainerrl

muupan · 2019-05-01T10:16:38Z

Merge ~~#446~~ ~~#448~~ first.

This PR adds a script to reproduce the TRPO results in "Deep Reinforcement Learning that Matters" (http://arxiv.org/abs/1709.06560).

As you can see from the tables in README, our results are competitive with theirs. https://github.com/muupan/chainerrl/tree/trpo-mujoco-matters/examples/mujoco/reproduction/trpo

muupan · 2019-08-27T05:52:28Z

/test

pfn-ci-bot · 2019-08-27T05:52:32Z

Successfully created a job for commit b0b017a:

Dashboard for commit b0b017a

examples/mujoco/reproduction/trpo/train_trpo.py

toslunar · 2019-09-06T06:43:52Z

examples/mujoco/reproduction/trpo/train_trpo.py

+
+    # Normalize observations based on their empirical mean and variance
+    obs_normalizer = chainerrl.links.EmpiricalNormalization(
+        obs_space.low.size, clip_threshold=5)


I didn't find clip_threshold=5 in the paper, but 5 seems sufficiently large.

They used a fork of openai/baselines. TRPO uses MlpPolicy, which internally clip normalized observations by 5.
https://github.com/Breakend/baselines/blob/master/baselines/trpo_mpi/run_mujoco.py#L31
https://github.com/Breakend/baselines/blob/master/baselines/ppo1/mlp_policy.py#L25

Co-Authored-By: Toshiki Kataoka <tos.lunar@gmail.com>

It should not affect the behavior as the default value of the argument is also 0.01.

muupan · 2019-09-06T07:21:04Z

/test

pfn-ci-bot · 2019-09-06T07:21:07Z

Successfully created a job for commit 8762890:

Dashboard for commit 8762890

toslunar

LGTM!

muupan · 2019-09-06T09:17:40Z

/test

pfn-ci-bot · 2019-09-06T09:17:43Z

Successfully created a job for commit dfdcf5d:

Dashboard for commit dfdcf5d

muupan added 18 commits April 29, 2019 01:44

Add a script for TRPO to reproduce the matters paper

16f8422

Merge branch 'recurrent-trpo' into trpo-mujoco-precompute

c3427ec

Merge branch 'ppo-mujoco-matters' into trpo-mujoco-precompute-orthogonal

947568e

Use orthogonal for trpo

71ee5a5

Merge branch 'recurrent-trpo' into trpo-mujoco-precompute-orthogonal

640be41

Merge branch 'recurrent-trpo' into trpo-mujoco-precompute-orthogonal

15a4ce1

Add README for TRPO

3a4c0c2

Clean train_trpo.py

1f741f0

Set clip_threshold=5

6a3adf4

Merge branch 'trpo-mujoco-matters-clip5' into trpo-mujoco-matters

2bd9e51

Add results and learning curve images

57e4b07

Merge branch 'ppo-mujoco-matters' into trpo-mujoco-matters

e24dc82

Clean options of train_trpo.py

c0749dc

Add mujoco/trpo to test_examples.sh

20afa40

Merge branch 'master' into trpo-mujoco-matters

8b6fa88

Move example dir

c5d27e2

Add a test script

2347bf9

Merge branch 'master' into trpo-mujoco-matters

b0b017a

muupan changed the title ~~[WIP] TRPO example that reproduces the "Deep Reinforcement Learning that Matters" paper~~ TRPO example that reproduces the "Deep Reinforcement Learning that Matters" paper Aug 27, 2019

muupan requested a review from toslunar August 27, 2019 06:31

muupan added 2 commits August 29, 2019 16:36

Merge branch 'master' into trpo-mujoco-matters

72bf532

Merge branch 'master' into trpo-mujoco-matters

2cc6ba3

toslunar reviewed Sep 6, 2019

View reviewed changes

muupan and others added 3 commits September 6, 2019 16:08

Update examples/mujoco/reproduction/trpo/train_trpo.py

b3d28fe

Co-Authored-By: Toshiki Kataoka <tos.lunar@gmail.com>

Add max_kl=0.01 to make it clear

a1446a8

It should not affect the behavior as the default value of the argument is also 0.01.

Merge branch 'master' into trpo-mujoco-matters

8762890

muupan requested a review from toslunar September 6, 2019 07:21

toslunar approved these changes Sep 6, 2019

View reviewed changes

Merge branch 'master' into trpo-mujoco-matters

dfdcf5d

muupan merged commit 63dad78 into chainer:master Sep 6, 2019

muupan deleted the trpo-mujoco-matters branch September 6, 2019 11:54

muupan added the example label Feb 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRPO example that reproduces the "Deep Reinforcement Learning that Matters" paper#449

TRPO example that reproduces the "Deep Reinforcement Learning that Matters" paper#449
muupan merged 24 commits intochainer:masterfrom
muupan:trpo-mujoco-matters

muupan commented May 1, 2019 •

edited

Loading

Uh oh!

muupan commented Aug 27, 2019

Uh oh!

pfn-ci-bot commented Aug 27, 2019

Uh oh!

Uh oh!

Uh oh!

toslunar Sep 6, 2019

Uh oh!

muupan Sep 6, 2019

Uh oh!

muupan commented Sep 6, 2019

Uh oh!

pfn-ci-bot commented Sep 6, 2019

Uh oh!

toslunar left a comment

Uh oh!

muupan commented Sep 6, 2019

Uh oh!

pfn-ci-bot commented Sep 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

muupan commented May 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muupan commented Aug 27, 2019

Uh oh!

pfn-ci-bot commented Aug 27, 2019

Uh oh!

Uh oh!

Uh oh!

toslunar Sep 6, 2019

Choose a reason for hiding this comment

Uh oh!

muupan Sep 6, 2019

Choose a reason for hiding this comment

Uh oh!

muupan commented Sep 6, 2019

Uh oh!

pfn-ci-bot commented Sep 6, 2019

Uh oh!

toslunar left a comment

Choose a reason for hiding this comment

Uh oh!

muupan commented Sep 6, 2019

Uh oh!

pfn-ci-bot commented Sep 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

muupan commented May 1, 2019 •

edited

Loading